Abstract
The freedom and transparency of information flow on the Internet has heightened concerns of privacy. Given a set of data items, clustering algorithms group similar items together. Clustering has many applications, such as customerbehavior analysis, targeted marketing, forensics, and bioinformatics. In this paper, we present the design and analysis of a privacy-preserving k-means clustering algorithm, where only the cluster means at the various steps of the algorithm are revealed to the participating parties. The crucial step in our privacy-preserving k-means is privacy-preserving computation of cluster means.We present two protocols (one based on oblivious polynomial evaluation and the second based on homomorphic encryption) for privacy-preserving computation of cluster means. We have a JAVA implementation of our algorithm. Using our implementation, we have performed a thorough evaluation of our privacy-preserving clustering algorithm on three data sets. Our evaluation demonstrates that privacy-preserving clustering is feasible, i.e., our homomorphic-encryption based algorithm finished clustering a large data set in approximately 66 seconds.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
104th Congress. Public Law 104-191: Health Insurance Portability and Accountability Act of 1996 (August 1996)
Adam, N.R., Wortmann, J.C.: Security-control methods for statistical databases: A comparative study. ACM Computing Surveys 21 (1989)
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, Dallas, TX, May 2000, pp. 439–450 (2000)
Bardley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: Proceedings of 15th International Conference on Machine Learning (ICML), pp. 91–99 (1998)
Benaloh, J.: Dense probabilistic encryption. In: Workshop on Selected Areas of Cryptography, May 1994, pp. 120–128 (1994)
Boneh, D., Franklin, M.K.: Efficient generation of shared RSA keys. Journal of the ACM (JACM) 48(4), 702–722 (2001)
Canetti, R.: Security and composition of multi-party cryptographic protocols. Journal of Cryptology 13(1), 143–202 (2000)
Cranor, L., Langheinrich, M., Marchiori, M., Presler-Marshall, M., Reagle, J.: The Platform for Privacy Preferences 1.0 (P3P1.0) Specification. W3C Recommendation, April 16 (2002)
Cranor, L.F.: Internet privacy. Communications of the ACM 42(2), 28–38 (1999)
Denning, D.E.: A security model for the statistical database problem. ACM Transactions on Database Systems (TODS) 5 (1980)
Dhillon, I.S., Marcotte, E.M., Roshan, U.: Diametrical clustering for identifying anti-correlated gene clusters. Bioinformatics 19(13), 1612–1619 (2003)
Dhillon, I.S., Modha, D.S.: A data-clustering algorithm on distributed memory multiprocessors. In: Proceedings of Large-scale Parallel KDD Systems Workshop (ACM SIGKDD), August 15-18 (1999)
Du, W., Atallah, M.J.: Privacy-preserving cooperative statistical analysis. In: Annual Computer Security Applications Conference ACSAC, New Orleans, Louisiana, USA, December 10-14, pp. 102–110 (2001)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, Chichester (2001)
Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 23–26, pp. 217–228 (2002)
Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strauss, M., Wright, R.N.: Secure multiparty computation of approximations. In: Orejas, F., Spirakis, P.G., van Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, p. 927. Springer, Heidelberg (2001)
Gilboa, N.: Two party rsa key generation. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, p. 116. Springer, Heidelberg (1999)
Goldberg, I., Wagner, D., Brewer, E.: Privacy-enhancing technologies for the internet. In: Proc. of 42nd IEEE Spring COMPCON, February 1997. IEEE Computer Society Press, Los Alamitos (1997)
Goldreich, O.: Foundations of Cryptography: Volume 1, Basic Tools. Cambridge University Press, Cambridge (2001)
Goldreich, O.: Foundations of Cryptography: Volume 2, Basic Applications. Cambridge University Press, Cambridge (2004)
Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game - a completeness theorem for protocols with honest majority. In: 19th Symposium on Theory of Computer Science, pp. 218–229 (1987)
Goldreich, O., Micali, S., Wigderson, A.: Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems. Journal of the ACM 38(1), 691–729 (1991)
Goldreich, O., Petrank, E.: Quantifying knowledge complexity. Computational Complexity 8, 50–98 (1999)
Goldwasser, S., Micali, S.: Probabilistic encryption. Journal of Computer and Systems Science 28, 270–299 (1984)
Tao Linux User Group. Tao Linux, version 1.0 (November 2004), http://taolinux.org/
Information and Computer Science. Pioneer-1 Mobile Robot Data. University of California Irvine (November 1998), http://kdd.ics.uci.edu/databases/pioneer/pioneer.html
Information and Computer Science. COIL 1999 Competition Data, The UCI KDD Archive. University of California Irvine (October 1999), http://kdd.ics.uci.edu/databases/coil/coil.html
Information and Computer Science. Japanese Vowels. University of California Irvine (June 2000), http://kdd.ics.uci.edu/databases/JapaneseVowels/JapaneseVowels.html
Julisch, K.: Clustering intrusion detection alarms to support root cause analysis. ACM Transactions on Information and System Security (TISSEC) 6(4), 443–471 (2003)
Kargupta, H., Huang, W., Sivakumar, K., Johnson, E.: Distributed clustering using collective principal component analysis. Knowledge and Information Systems 3(4), 405–421 (2001)
Klusch, M., Lodi, S., Moro, G.: Distributed clustering based on sampling local density estimates. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI 2003), pp. 485–490 (2003)
Kudo, M., Toyama, J., Shimbo, M.: Multidimensional Curve Classification Using Passing-Through Regions. Pattern Recognition Letters (11–13), 1103–1111 (1999)
Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000)
Llyod, S.P.: Least squares quantization in pcm. IEEE Transactions on Information Theory IT-2, 129–137 (1982)
Malkhi, D., Nisan, N., Pinkas, B., Sella, Y.: Fairplay – A Secure Two-Party Computation System. In: Proceedings of 13th USENIX Security Symposium, San Diego, CA, September 2004. USENIX, pp. 287–302 (2004)
Marchette, D.: A statistical method for profiling network traffic. In: Workshop on Intrusion Detection and Network Monitoring, pp. 119–128 (1999)
Merugu, S., Ghosh, J.: Privacy-preserving distributed clustering using generative models. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), pp. 211–218 (2003)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Naccache, D., Stern, J.: A new public key cryptosystem based on higher residues. In: Proceedings of the 5th ACM Conference on Computer and Communications Security (CCS), San Francisco, California (1998)
Naor, M., Pinkas, B.: Oblivious transfer and polynomial evaluation. In: 31st Symposium on Theory of Computer Science, Atlanta, GA, May 1-4, pp. 245–254 (1999)
Oliveira, S., Zaiane, O.R.: Privacy preserving clustering by data transformation. In: XVIII Simpósio Brasileiro de Bancos de Dados, 6-8 de Outubro (SBBD 2003), pp. 304–318 (2003)
Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, p. 223. Springer, Heidelberg (1999)
Pouget, F., Dacier, M.: Honeypot-based forensics. In: Proceedings Of AusCERT Asia Pacific Information technology Security Conference 2004 (AusCERT2004), Brisbane, Australia (May 2004)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Rind, D.M., Kohane, I.S., Szolovits, P., Safran, C., Chueh, H.C., Barnett, G.O.: Maintaining the confidentiality of medical records shared over the internet and the world wide web. Annals of Internal Medicine 127(2) (July 1997)
Rizvi, S.J., Harista, J.R.: Maintaining data privacy in association rule mining. In: Proceedings of 28th International Conference on Very Large Data Bases (VLDB), Hong Kong, August 20-23 (2002)
Sun Microsystems. Sun Java Virutal Machine, version 1.5 (November 2004), http://java.sun.com/
Taylor, H.: Most people are “privacy pragmatists” who, while concerned about privacy, will sometimes trade it off for other benefits. The Harris Poll (17), March 19 (2003)
Turow, J.: Americans and online privacy: The system is broken. Technical report, Annenberg Public Policy Center (June 2003)
Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 23–26, pp. 217–228 (2002)
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215 (2003)
Yao, A.C.: How to generate and exchange secrets. In: 27th IEEE Symposium on Foundations of Computer Science, pp. 162–167 (1986)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jha, S., Kruger, L., McDaniel, P. (2005). Privacy Preserving Clustering. In: di Vimercati, S.d.C., Syverson, P., Gollmann, D. (eds) Computer Security – ESORICS 2005. ESORICS 2005. Lecture Notes in Computer Science, vol 3679. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11555827_23
Download citation
DOI: https://doi.org/10.1007/11555827_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28963-0
Online ISBN: 978-3-540-31981-8
eBook Packages: Computer ScienceComputer Science (R0)