Data mining can extract important knowledge from large data collections, but sometimes these collections are split among various parties. Data warehousing, bringing data from multiple sources under a single authority, increases risk of privacy violations. Furthermore, privacy concerns may prevent the parties from directly sharing even some meta-data.
Distributed data mining and processing provide a means to address this issue, particularly if queries are processed in a way that avoids the disclosure of any information beyond the final result. This chapter describes methods to mine horizontally partitioned data without violating privacy and discusses how to use the data mining results in a privacy-preserving way. The methods described here incorporate cryptographic techniques to minimize the information shared, while adding as little as possible overhead to the mining and processing task.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chang, Yan-Cheng and Lu, Chi-Jen (2001). Oblivious polynomial evaluation and oblivious neural learning. Lecture Notes in Computer Science, 2248:369+.
Cramer, R., Gilboa, Niv, Naor, Moni, Pinkas, Benny, and Poupard, G. (2000). Oblivious Polynomial Evaluation. Can be found in the Privacy Preserving Data Mining paper by Naor and Pinkas.
Cramer, Ronald, Damgård, Ivan, and Nielsen, Jesper B. (2001). Multiparty computation from threshold homomorphic encryption. Lecture Notes in Computer Science, 2045:280+.
Damgard, I., Jurik, M., and Nielsen, J. (2003). A generalization of paillier’s public-key system with applications to electronic voting.
Du, Wenliang and Atallah, Mikhail J. (2001). Privacy-preserving statistical analysis. In Proceeding of the 17th Annual Computer Security Applications Conference, New Orleans, Louisiana, USA.
Du, Wenliang and Zhan, Zhijun (2002). Building decision tree classifier on private data. In Clifton, Chris and Estivill-Castro, Vladimir, editors, IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, volume 14, pages 1–8, Maebashi City, Japan. Australian Computer Society.
Feigenbaum, Joan, Ishai, Yuval, Malkin, Tal, Nissim, Kobbi, Strauss, Martin J., and Wright, Rebecca N. (2006). Secure multiparty computation of approximations. ACM Trans. Algorithms, 2(3):435–472.
Feingold, Mr., Corzine, Mr., Wyden, Mr., and Nelson, Mr. (2003). Data Mining Moratorium Act of 2003. U.S. Senate Bill (proposed).
Freedman, Michael J., Nissim, Kobbi, and Pinkas, Benny (2004). Efficient private matching and set intersection. In Eurocrypt 2004, Interlaken, Switzerland. International Association for Cryptologic Research (IACR).
Friedman, Arik, Wolff, Ran, and Schuster, Assaf (to appear). Providing k-anonymity in data mining. VLDB Journal.
Fukunaga, Keinosuke (1990). Introduction to Statistical Pattern Recognition. Academic Press, San Diego, CA.
Goethals, Bart, Laur, Sven, Lipmaa, Helger, and Mielikäinen, Taneli (2004). On Secure Scalar Product Computation for Privacy-Preserving Data Mining. In Park, Choonsik and Chee, Seongtaek, editors, The 7th Annual International Conference in Information Security and Cryptology (ICISC 2004), volume 3506, pages 104–120.
Goldreich, Oded (2004). The Foundations of Cryptography, volume 2, chapter General Cryptographic Protocols. Cambridge University Press.
Ioannidis, Ioannis, Grama, Ananth, and Atallah, Mikhail (2002). A secure protocol for computing dot-products in clustered and distributed environments. In The 2002 International Conference on Parallel Processing, Vancouver, British Columbia.
Jagannathan, Geetha and Wright, Rebecca N. (2005). Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In Proceedings of the 2005 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 593–599, Chicago, IL.
Jiang, Wei, Clifton, Chris, and Kantarcioglu, Murat (To appear.). Transforming semi-honest protocols to ensure accountability. Data and Knowledge Engineering.
Kantarcioglu, Murat and Kardes, Onur (2006). Privacy-preserving data mining in malicious model. Technical Report CS-2006-06, Stevens Institute of Technology.
Kantarcioglu, Murat and Vaidya, Jaideep (2003). Privacy preserving naive bayes classifier for horizontally partitioned data. In the Workshop on Privacy Preserving Data Mining held in association with The Third IEEE International Conference on Data Mining, Melbourne, FL.
Kantarcıoğlu, Murat and Clifton, Chris (2002). Privacy-preserving distributed mining of association rules on horizontally partitioned data. In The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’02), pages 24–31, Madison, Wisconsin.
Kantarcıoğlu, Murat and Clifton, Chris (2004a). Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE TKDE, 16(9):1026–1037.
Kantarcıoğlu, Murat and Clifton, Chris (2004b). Privately computing a distributed k-nn classifier. In Boulicaut, Jean-Franois, Esposito, Floriana, Giannotti, Fosca, and Pedreschi, Dino, editors, PKDD2004: 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 279–290, Pisa, Italy.
Kantarcıoğlu, Murat, Jin, Jiashun, and Clifton, Chris (2004). When do data mining results violate privacy? In Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 599–604, Seattle, WA.
Kissner, L. and Song, D. (2005). Privacy-preserving set operations. In Advances in Cryptology — CRYPTO 2005.
Lin, Xiaodong, Clifton, Chris, and Zhu, Michael (2005). Privacy preserving clustering with distributed EM mixture modeling. Knowledge and Information Systems, 8(1):68–81.
Lindell, Yehuda and Pinkas, Benny (2000). Privacy preserving data mining. In Advances in Cryptology – CRYPTO 2000, pages 36–54. Springer-Verlag.
Lindell, Yehuda and Pinkas, Benny (2002). Privacy preserving data mining. Journal of Cryptology, 15(3):177–206.
Mitchell, Tom (1997). Machine Learning. McGraw-Hill Science/Engineering/Math, 1st edition.
Naor, Moni and Pinkas, Benny (1999). Oblivious transfer and polynomial evaluation. In Proceedings of the Thirty-first Annual ACM Symposium on Theory of Computing, pages 245–254, Atlanta, Georgia, United States. ACM Press.
Paillier, P. (1999). Public key cryptosystems based on composite degree residuosity classes. In Advances in Cryptology - Eurocrypt ’99 Proceedings, LNCS 1592, pages 223–238. Springer-Verlag.
Perry, John M. (2005). Statement of john m. perry, president and ceo, cardsystems solutions, inc. before the united states house of representatives subcommittee on oversight and investigations of the committee on financial services. http://financialservices.house.gov/hearings.asp?formmode=detail&hearing=407&comm=4.
Vaidya, Jaideep and Clifton, Chris (2002). Privacy preserving association rule mining in vertically partitioned data. In The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 639–644, Edmonton, Alberta, Canada.
Vaidya, Jaideep and Clifton, Chris (2005). Secure set intersection cardinality with application to association rule mining. Journal of Computer Security, 13(4).
Yao, Andrew C. (1986). How to generate and exchange secrets. In Proceedings of the 27th IEEE Symposium on Foundations of Computer Science, pages 162–167. IEEE.
Yu, Hwanjo, Jiang, Xiaoqian, and Vaidya, Jaideep (2006). Privacy-preserving svm using nonlinear kernels on horizontally partitioned data. In SAC ’06: Proceedings of the 2006 ACM symposium on Applied computing, pages 603–610, New York, NY, USA. ACM Press.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Kantarcioglu, M. (2008). A Survey of Privacy-Preserving Methods Across Horizontally Partitioned Data. In: Aggarwal, C.C., Yu, P.S. (eds) Privacy-Preserving Data Mining. Advances in Database Systems, vol 34. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-70992-5_13
Download citation
DOI: https://doi.org/10.1007/978-0-387-70992-5_13
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-70991-8
Online ISBN: 978-0-387-70992-5
eBook Packages: Computer ScienceComputer Science (R0)