A Survey of Privacy-Preserving Methods Across Horizontally Partitioned Data
Data mining can extract important knowledge from large data collections, but sometimes these collections are split among various parties. Data warehousing, bringing data from multiple sources under a single authority, increases risk of privacy violations. Furthermore, privacy concerns may prevent the parties from directly sharing even some meta-data.
Distributed data mining and processing provide a means to address this issue, particularly if queries are processed in a way that avoids the disclosure of any information beyond the final result. This chapter describes methods to mine horizontally partitioned data without violating privacy and discusses how to use the data mining results in a privacy-preserving way. The methods described here incorporate cryptographic techniques to minimize the information shared, while adding as little as possible overhead to the mining and processing task.
KeywordsPrivacy distributed data mining horizontally partitioned data and homomorphic encryption
Unable to display preview. Download preview PDF.
- 2.Cramer, R., Gilboa, Niv, Naor, Moni, Pinkas, Benny, and Poupard, G. (2000). Oblivious Polynomial Evaluation. Can be found in the Privacy Preserving Data Mining paper by Naor and Pinkas.Google Scholar
- 4.Damgard, I., Jurik, M., and Nielsen, J. (2003). A generalization of paillier’s public-key system with applications to electronic voting.Google Scholar
- 5.Du, Wenliang and Atallah, Mikhail J. (2001). Privacy-preserving statistical analysis. In Proceeding of the 17th Annual Computer Security Applications Conference, New Orleans, Louisiana, USA.Google Scholar
- 6.Du, Wenliang and Zhan, Zhijun (2002). Building decision tree classifier on private data. In Clifton, Chris and Estivill-Castro, Vladimir, editors, IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, volume 14, pages 1–8, Maebashi City, Japan. Australian Computer Society.Google Scholar
- 8.Feingold, Mr., Corzine, Mr., Wyden, Mr., and Nelson, Mr. (2003). Data Mining Moratorium Act of 2003. U.S. Senate Bill (proposed).Google Scholar
- 9.Freedman, Michael J., Nissim, Kobbi, and Pinkas, Benny (2004). Efficient private matching and set intersection. In Eurocrypt 2004, Interlaken, Switzerland. International Association for Cryptologic Research (IACR).Google Scholar
- 10.Friedman, Arik, Wolff, Ran, and Schuster, Assaf (to appear). Providing k-anonymity in data mining. VLDB Journal.Google Scholar
- 12.Goethals, Bart, Laur, Sven, Lipmaa, Helger, and Mielikäinen, Taneli (2004). On Secure Scalar Product Computation for Privacy-Preserving Data Mining. In Park, Choonsik and Chee, Seongtaek, editors, The 7th Annual International Conference in Information Security and Cryptology (ICISC 2004), volume 3506, pages 104–120.Google Scholar
- 13.Goldreich, Oded (2004). The Foundations of Cryptography, volume 2, chapter General Cryptographic Protocols. Cambridge University Press.Google Scholar
- 14.Ioannidis, Ioannis, Grama, Ananth, and Atallah, Mikhail (2002). A secure protocol for computing dot-products in clustered and distributed environments. In The 2002 International Conference on Parallel Processing, Vancouver, British Columbia.Google Scholar
- 15.Jagannathan, Geetha and Wright, Rebecca N. (2005). Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In Proceedings of the 2005 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 593–599, Chicago, IL.Google Scholar
- 16.Jiang, Wei, Clifton, Chris, and Kantarcioglu, Murat (To appear.). Transforming semi-honest protocols to ensure accountability. Data and Knowledge Engineering.Google Scholar
- 17.Kantarcioglu, Murat and Kardes, Onur (2006). Privacy-preserving data mining in malicious model. Technical Report CS-2006-06, Stevens Institute of Technology.Google Scholar
- 18.Kantarcioglu, Murat and Vaidya, Jaideep (2003). Privacy preserving naive bayes classifier for horizontally partitioned data. In the Workshop on Privacy Preserving Data Mining held in association with The Third IEEE International Conference on Data Mining, Melbourne, FL.Google Scholar
- 19.Kantarcıoğlu, Murat and Clifton, Chris (2002). Privacy-preserving distributed mining of association rules on horizontally partitioned data. In The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’02), pages 24–31, Madison, Wisconsin.Google Scholar
- 20.Kantarcıoğlu, Murat and Clifton, Chris (2004a). Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE TKDE, 16(9):1026–1037.Google Scholar
- 21.Kantarcıoğlu, Murat and Clifton, Chris (2004b). Privately computing a distributed k-nn classifier. In Boulicaut, Jean-Franois, Esposito, Floriana, Giannotti, Fosca, and Pedreschi, Dino, editors, PKDD2004: 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 279–290, Pisa, Italy.Google Scholar
- 22.Kantarcıoğlu, Murat, Jin, Jiashun, and Clifton, Chris (2004). When do data mining results violate privacy? In Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 599–604, Seattle, WA.Google Scholar
- 23.Kissner, L. and Song, D. (2005). Privacy-preserving set operations. In Advances in Cryptology — CRYPTO 2005.Google Scholar
- 25.Lindell, Yehuda and Pinkas, Benny (2000). Privacy preserving data mining. In Advances in Cryptology – CRYPTO 2000, pages 36–54. Springer-Verlag.Google Scholar
- 27.Mitchell, Tom (1997). Machine Learning. McGraw-Hill Science/Engineering/Math, 1st edition.Google Scholar
- 29.Paillier, P. (1999). Public key cryptosystems based on composite degree residuosity classes. In Advances in Cryptology - Eurocrypt ’99 Proceedings, LNCS 1592, pages 223–238. Springer-Verlag.Google Scholar
- 30.Perry, John M. (2005). Statement of john m. perry, president and ceo, cardsystems solutions, inc. before the united states house of representatives subcommittee on oversight and investigations of the committee on financial services. http://financialservices.house.gov/hearings.asp?formmode=detail&hearing=407&comm=4.
- 31.Vaidya, Jaideep and Clifton, Chris (2002). Privacy preserving association rule mining in vertically partitioned data. In The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 639–644, Edmonton, Alberta, Canada.Google Scholar
- 32.Vaidya, Jaideep and Clifton, Chris (2005). Secure set intersection cardinality with application to association rule mining. Journal of Computer Security, 13(4).Google Scholar
- 33.Yao, Andrew C. (1986). How to generate and exchange secrets. In Proceedings of the 27th IEEE Symposium on Foundations of Computer Science, pages 162–167. IEEE.Google Scholar