A Survey of Privacy-Preserving Methods Across Vertically Partitioned Data

  • Jaideep Vaidya
Part of the Advances in Database Systems book series (ADBS, volume 34)

The goal of data mining is to extract or “mine” knowledge from large amounts of data. However, data is often collected by several different sites. Privacy, legal and commercial concerns restrict centralized access to this data, thus derailing data mining projects. Recently, there has been growing focus on finding solutions to this problem. Several algorithms have been proposed that do distributed knowledge discovery, while providing guarantees on the non-disclosure of data. Vertical partitioning of data is an important data distribution model often found in real life. Vertical partitioning or heterogeneous distribution implies that different features of the same set of data are collected by different sites. In this chapter we survey some of the methods developed in the literature to mine vertically partitioned data without violating privacy and discuss challenges and complexities specific to vertical partitioning.


Vertically partitioned data privacy-preserving data mining 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Murat Kantarcioglu. A survey of Privacy-Preserving Methods across Horizontall Partitioned Data. Privacy-Preserving Data Mining: Models and Algorithms. Ed. Charu Aggarwal, Philip Yu, Springer, 2008.Google Scholar
  2. 2.
    Rakesh Agrawal, Alexandre Evfimievski, and Ramakrishnan Srikant. Information sharing across private databases. In Proceedings of ACM SIGMOD International Conference on Management of Data, San Diego, California, June 9-12 2003.Google Scholar
  3. 3.
    Daniel Barbará, Ningning Wu, and Sushil Jajodia. Detecting novel network intrusions using bayes estimators. In First SIAM International Conference on Data Mining, Chicago, Illinois, April 5-7 2001.Google Scholar
  4. 4.
    Vic Barnett and Toby Lewis. Outliers in Statistical Data. John Wiley and Sons, 3rd edition, 1994.Google Scholar
  5. 5.
    Christian Cachin. Efficient private bidding and auctions with an oblivious third party. In Proceedings of the 6th ACM conference on Computer and communications security, pages 120–127. ACM Press, 1999.Google Scholar
  6. 6.
    Gregory F. Cooper and Edward Herskovits. A bayesian method for the induction of probabilistic networks from data. Mach. Learn., 9(4):309–347, 1992.zbMATHGoogle Scholar
  7. 7.
    Wenliang Du and Mikhail J. Atallah. Privacy-preserving statistical analysis. In Proceeding of the 17th Annual Computer Security Applications Conference, New Orleans, Louisiana, USA, December 10-14 2001.Google Scholar
  8. 8.
    Wenliang Du and Zhijun Zhan. Building decision tree classifier on private data. In Chris Clifton and Vladimir Estivill-Castro, editors, IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, volume 14, pages 1–8, Maebashi City, Japan, December 9 2002. Australian Computer Society.Google Scholar
  9. 9.
    Directive 95/46/EC of the european parliament and of the council of 24 october 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Official Journal of the European Communities, No I.(281):31–50, October 24 1995.Google Scholar
  10. 10.
    Michael J. Freedman, Kobbi Nissim, and Benny Pinkas. Efficient private matching and set intersection. In Eurocrypt 2004, Interlaken, Switzerland, May 2-6 2004. International Association for Cryptologic Research (IACR).Google Scholar
  11. 11.
    Bart Goethals, Sven Laur, Helger Lipmaa, and Taneli Mielikäinen. On Secure Scalar Product Computation for Privacy-Preserving Data Mining. In Choonsik Park and Seongtaek Chee, editors, The 7th Annual International Conference in Information Security and Cryptology (ICISC 2004), volume 3506, pages 104–120, December 2–3, 2004.Google Scholar
  12. 12.
    D. M. Hawkins. Identification of Outliers. Chapman and Hall, 1st edition, 1980.Google Scholar
  13. 13.
    Standard for privacy of individually identifiable health information. Federal Register, 66(40), February 28 2001.Google Scholar
  14. 14.
    Ioannis Ioannidis, Ananth Grama, and Mikhail Atallah. A secure protocol for computing dot-products in clustered and distributed environments. In The 2002 International Conference on Parallel Processing, Vancouver, British Columbia, August 18-21 2002.Google Scholar
  15. 15.
    Geetha Jagannathan and Rebecca N. Wright. Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In Proceedings of the 2005 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 593–599, Chicago, IL, August 21-24 2005.Google Scholar
  16. 16.
    Murat Kantarcıoǧlu and Chris Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering, 16(9):1026–1037, September 2004.CrossRefGoogle Scholar
  17. 17.
    Edwin M. Knorr and Raymond T. Ng. Algorithms for mining distance-based outliers in large datasets. In Proceedings of 24th International Conference on Very Large Data Bases (VLDB 1998), pages 392–403, New York City, NY, USA, August24-27 1998.Google Scholar
  18. 18.
    Edwin M. Knorr, Raymond T. Ng, and Vladimir Tucakov. Distance-based outliers: algorithms and applications. The VLDB Journal, 8(3–4):237–253, 2000.CrossRefGoogle Scholar
  19. 19.
    Aleksandar Lazarevic, Aysel Ozgur, Levent Ertoz, Jaideep Srivastava, and Vipin Kumar. A comparative study of anomaly detection schemes in network intrusion detection. In SIAM International Conference on Data Mining (2003), San Francisco, California, May 1-3 2003.Google Scholar
  20. 20.
    Yehuda Lindell and Benny Pinkas. Privacy preserving data mining. In Advances in Cryptology – CRYPTO 2000, pages 36–54. Springer-Verlag, August 20-24 2000.Google Scholar
  21. 21.
    Yehuda Lindell and Benny Pinkas. Privacy preserving data mining. Journal of Cryptology, 15(3):177–206, 2002.zbMATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 427–438. ACM Press, 2000.Google Scholar
  23. 23.
    Mark Shaneck, Yongdae Kim, and Vipin Kumar. Privacy preserving nearest neighbor search. In ICDM Workshops, pages 541–545. IEEE Computer Society, 2006.Google Scholar
  24. 24.
    Dragos Trinca and Sanguthevar Rajasekaran. Towards a collusion-resistant algebraic multi-party protocol for privacy-preserving association rule mining in vertically partitioned data. In 3rd International Workshop on Information Assurance, April11–13 2007.Google Scholar
  25. 25.
    Jaideep Vaidya and Chris Clifton. Privacy preserving association rule mining in vertically partitioned data. In The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 639–644, Edmonton, Alberta, Canada, July 23-26 2002.Google Scholar
  26. 26.
    Jaideep Vaidya and Chris Clifton. Privacy-preserving k-means clustering over vertically partitioned data. In The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 206–215, Washington, DC, August 24-27 2003.Google Scholar
  27. 27.
    Jaideep Vaidya and Chris Clifton. Privacy preserving naïve bayes classifier for vertically partitioned data. In 2004 SIAM International Conference on Data Mining, pages 522–526, Lake Buena Vista, Florida, April 22–24 2004.Google Scholar
  28. 28.
    Jaideep Vaidya and Chris Clifton. Privacy-preserving outlier detection. In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM’04), pages 233–240, Los Alamitos, CA, November 1 – 4 2004. IEEE Computer Society Press.CrossRefGoogle Scholar
  29. 29.
    Jaideep Vaidya and Chris Clifton. Privacy-preserving decision trees over vertically partitioned data. In The 19th Annual IFIP WG 11.3 Working Conference on Data and Applications Security, Storrs, Connecticut, August 7-10 2005. Springer.Google Scholar
  30. 30.
    Jaideep Vaidya and Chris Clifton. Secure set intersection cardinality with application to association rule mining. Journal of Computer Security, 13(4):593–622, November 2005.Google Scholar
  31. 31.
    Rebecca Wright and Zhiqiang Yang. Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, August22-25 2004.Google Scholar
  32. 32.
    Andrew C. Yao. How to generate and exchange secrets. In Proceedings of the 27th IEEE Symposium on Foundations of Computer Science, pages 162–167. IEEE, 1986.Google Scholar
  33. 33.
    Sheng Zhong. Privacy-preserving algorithms for distributed mining of frequent itemsets. Information Sciences, 177(2):490–503, 2007.zbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Jaideep Vaidya
    • 1
  1. 1.MSIS Department and CIMICRutgers UniversityClarionUSA

Personalised recommendations