A Survey of Randomization Methods for Privacy-Preserving Data Mining

  • Charu C. AggarwalIII
  • Philip S. Yu
Part of the Advances in Database Systems book series (ADBS, volume 34)

A well known method for privacy-preserving data mining is that of randomization. In randomization, we add noise to the data so that the behavior of the individual records is masked. However, the aggregate behavior of the data distribution can be reconstructed by subtracting out the noise from the data. The reconstructed distribution is often sufficient for a variety of data mining tasks such as classification. In this chapter, we will provide a survey of the randomization method for privacy-preserving data mining.


Randomization privacy quantification perturbation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal C. C.: On Randomization, Public Information and the Curse of Dimensionality. ICDE Conference, 2007.Google Scholar
  2. 2.
    Aggarwal C. C., Yu P. S.: On Privacy-Preservation of Text and Sparse Binary Data with Sketches. SIAM Conference on Data Mining, 2007.Google Scholar
  3. 3.
    Agrawal R., Srikant R. Privacy-Preserving Data Mining. Proceedings of the ACM SIGMOD Conference, 2000.Google Scholar
  4. 4.
    Agrawal R., Srikant R., Thomas D. Privacy-Preserving OLAP. Proceedings of the ACM SIGMOD Conference, 2005.Google Scholar
  5. 5.
    Agrawal D. Aggarwal C. C. On the Design and Quantification of Privacy-Preserving Data Mining Algorithms. ACM PODS Conference, 2002.Google Scholar
  6. 6.
    Chen K., Liu L.: Privacy-preserving data classification with rotation perturbation. ICDM Conference, 2005.Google Scholar
  7. 7.
    Evfimievski A., Gehrke J., Srikant R. Limiting Privacy Breaches in Privacy Preserving Data Mining. ACM PODS Conference, 2003.Google Scholar
  8. 8.
    Evfimievski A., Srikant R., Agrawal R., Gehrke J.: Privacy-Preserving Mining of Association Rules. ACM KDD Conference, 2002.Google Scholar
  9. 9.
    Fienberg S., McIntyre J.: Data Swapping: Variations on a Theme by Dalenius and Reiss. Technical Report, National Institute of Statistical Sciences, 2003.Google Scholar
  10. 10.
    Gambs S., Kegl B., Aimeur E.: Privacy-Preserving Boosting. Knowledge Discovery and Data Mining Journal, to appear.Google Scholar
  11. 11.
    Huang Z., Du W., Chen B.: Deriving Private Information from Randomized Data. pp. 37–48, ACM SIGMOD Conference, 2005.Google Scholar
  12. 12.
    Warner S. L. Randomized Response: A survey technique for eliminating evasive answer bias. Journal of American Statistical Association, 60(309):63–69, March 1965.CrossRefGoogle Scholar
  13. 13.
    Johnson W., Lindenstrauss J.: Extensions of Lipshitz Mapping into Hilbert Space, Contemporary Math. vol. 26, pp. 189–206, 1984.zbMATHMathSciNetGoogle Scholar
  14. 14.
    Kargupta H., Datta S., Wang Q., Sivakumar K.: On the Privacy Preserving Properties of Random Data Perturbation Techniques. ICDM Conference, pp. 99–106, 2003.Google Scholar
  15. 15.
    Kim J., Winkler W.: Multiplicative Noise for Masking Continuous Data, Technical Report Statistics 2003-01, Statistical Research Division, US Bureau of the Census, Washington D.C., Apr. 2003.Google Scholar
  16. 16.
    Liew C. K., Choi U. J., Liew C. J. A data distortion by probability distribution. ACM TODS, 10(3):395–411, 1985.zbMATHCrossRefGoogle Scholar
  17. 17.
    Liu K., Kargupta H., Ryan J.: Random Projection Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining. IEEE Transactions on Knowledge and Data Engineering, 18(1), 2006.Google Scholar
  18. 18.
    Liu K., Giannella C., Kargupta H.: An Attacker’s View of Distance Preserving Maps for Privacy-Preserving Data Mining. PKDD Conference, 2006.Google Scholar
  19. 19.
    Mukherjee S., Chen Z., Gangopadhyay S.: A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier based transforms, VLDB Journal, 2006.Google Scholar
  20. 20.
    Oliveira S. R. M., Zaane O.: Privacy Preserving Clustering by Data Transformation, Proc. 18th Brazilian Symp. Databases, pp. 304–318, Oct. 2003.Google Scholar
  21. 21.
    Oliveira S. R. M., Zaiane O.: Data Perturbation by Rotation for Privacy-Preserving Clustering, Technical Report TR04–17, Department of Computing Science, University of Alberta, Edmonton, AB, Canada, August 2004.Google Scholar
  22. 22.
    Polat H., Du W.: SVD-based collaborative filtering with privacy. ACM SAC Symposium, 2005.Google Scholar
  23. 23.
    Polat H., Du W.: Privacy-preserving collaborative filtering with randomized perturbation techniques. ICDM Conference, 2003.Google Scholar
  24. 24.
    Rizvi S., Haritsa J.: Maintaining Data Privacy in Association Rule Mining. VLDB Conference, 2002.Google Scholar
  25. 25.
    Samarati P.: Protecting Respondents’ Identities in Microdata Release. IEEE Trans. Knowl. Data Eng. 13(6): 1010–1027 (2001).CrossRefGoogle Scholar
  26. 26.
    Shannon C. E.: The Mathematical Theory of Communication, University of Illinois Press, 1949.Google Scholar
  27. 27.
    Silverman B. W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.Google Scholar
  28. 28.
    Li F., Sun J., Papadimitriou S., Mihaila G., Stanoi I.: Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking. ICDE Conference, 2007.Google Scholar
  29. 29.
    Zhang P., Tong Y., Tang S., Yang D.: Privacy-Preserving Naive Bayes Classifier. Lecture Notes in Computer Science, Vol 3584, 2005.Google Scholar
  30. 30.
    Zhu Y., Liu L. Optimal Randomization for Privacy- Preserving Data Mining. ACM KDD Conference, 2004.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Charu C. AggarwalIII
    • 1
  • Philip S. Yu
    • 2
  1. 1.IBM Thomas J. Watson Research CenterHawthorneUSA
  2. 2.Department of Computer ScienceUniversity of Illinois at ChicagoChicagoUSA

Personalised recommendations