Privacy Preserving Clustering

Jha, Somesh; Kruger, Luis; McDaniel, Patrick

doi:10.1007/11555827_23

Somesh Jha¹⁹,
Luis Kruger¹⁹ &
Patrick McDaniel²⁰

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 3679))

Included in the following conference series:

European Symposium on Research in Computer Security

2172 Accesses
83 Citations

Abstract

The freedom and transparency of information flow on the Internet has heightened concerns of privacy. Given a set of data items, clustering algorithms group similar items together. Clustering has many applications, such as customerbehavior analysis, targeted marketing, forensics, and bioinformatics. In this paper, we present the design and analysis of a privacy-preserving k-means clustering algorithm, where only the cluster means at the various steps of the algorithm are revealed to the participating parties. The crucial step in our privacy-preserving k-means is privacy-preserving computation of cluster means.We present two protocols (one based on oblivious polynomial evaluation and the second based on homomorphic encryption) for privacy-preserving computation of cluster means. We have a JAVA implementation of our algorithm. Using our implementation, we have performed a thorough evaluation of our privacy-preserving clustering algorithm on three data sets. Our evaluation demonstrates that privacy-preserving clustering is feasible, i.e., our homomorphic-encryption based algorithm finished clustering a large data set in approximately 66 seconds.

Download to read the full chapter text

Chapter PDF

K-Means Clustering Using Homomorphic Encryption and an Updatable Distance Matrix: Secure Third Party Data Clustering with Limited Data Owner Interaction

Oblivious Sampling with Applications to Two-Party k-Means Clustering

Article 12 May 2020

Privacy Preserving Multi-server k-means Computation over Horizontally Partitioned Data

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

104th Congress. Public Law 104-191: Health Insurance Portability and Accountability Act of 1996 (August 1996)
Google Scholar
Adam, N.R., Wortmann, J.C.: Security-control methods for statistical databases: A comparative study. ACM Computing Surveys 21 (1989)
Google Scholar
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, Dallas, TX, May 2000, pp. 439–450 (2000)
Google Scholar
Bardley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: Proceedings of 15th International Conference on Machine Learning (ICML), pp. 91–99 (1998)
Google Scholar
Benaloh, J.: Dense probabilistic encryption. In: Workshop on Selected Areas of Cryptography, May 1994, pp. 120–128 (1994)
Google Scholar
Boneh, D., Franklin, M.K.: Efficient generation of shared RSA keys. Journal of the ACM (JACM) 48(4), 702–722 (2001)
Article MATH MathSciNet Google Scholar
Canetti, R.: Security and composition of multi-party cryptographic protocols. Journal of Cryptology 13(1), 143–202 (2000)
Article MATH MathSciNet Google Scholar
Cranor, L., Langheinrich, M., Marchiori, M., Presler-Marshall, M., Reagle, J.: The Platform for Privacy Preferences 1.0 (P3P1.0) Specification. W3C Recommendation, April 16 (2002)
Google Scholar
Cranor, L.F.: Internet privacy. Communications of the ACM 42(2), 28–38 (1999)
Article Google Scholar
Denning, D.E.: A security model for the statistical database problem. ACM Transactions on Database Systems (TODS) 5 (1980)
Google Scholar
Dhillon, I.S., Marcotte, E.M., Roshan, U.: Diametrical clustering for identifying anti-correlated gene clusters. Bioinformatics 19(13), 1612–1619 (2003)
Article Google Scholar
Dhillon, I.S., Modha, D.S.: A data-clustering algorithm on distributed memory multiprocessors. In: Proceedings of Large-scale Parallel KDD Systems Workshop (ACM SIGKDD), August 15-18 (1999)
Google Scholar
Du, W., Atallah, M.J.: Privacy-preserving cooperative statistical analysis. In: Annual Computer Security Applications Conference ACSAC, New Orleans, Louisiana, USA, December 10-14, pp. 102–110 (2001)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, Chichester (2001)
MATH Google Scholar
Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 23–26, pp. 217–228 (2002)
Google Scholar
Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strauss, M., Wright, R.N.: Secure multiparty computation of approximations. In: Orejas, F., Spirakis, P.G., van Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, p. 927. Springer, Heidelberg (2001)
Chapter Google Scholar
Gilboa, N.: Two party rsa key generation. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, p. 116. Springer, Heidelberg (1999)
Google Scholar
Goldberg, I., Wagner, D., Brewer, E.: Privacy-enhancing technologies for the internet. In: Proc. of 42nd IEEE Spring COMPCON, February 1997. IEEE Computer Society Press, Los Alamitos (1997)
Google Scholar
Goldreich, O.: Foundations of Cryptography: Volume 1, Basic Tools. Cambridge University Press, Cambridge (2001)
Book Google Scholar
Goldreich, O.: Foundations of Cryptography: Volume 2, Basic Applications. Cambridge University Press, Cambridge (2004)
Google Scholar
Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game - a completeness theorem for protocols with honest majority. In: 19th Symposium on Theory of Computer Science, pp. 218–229 (1987)
Google Scholar
Goldreich, O., Micali, S., Wigderson, A.: Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems. Journal of the ACM 38(1), 691–729 (1991)
MATH MathSciNet Google Scholar
Goldreich, O., Petrank, E.: Quantifying knowledge complexity. Computational Complexity 8, 50–98 (1999)
Article MATH MathSciNet Google Scholar
Goldwasser, S., Micali, S.: Probabilistic encryption. Journal of Computer and Systems Science 28, 270–299 (1984)
Article MATH MathSciNet Google Scholar
Tao Linux User Group. Tao Linux, version 1.0 (November 2004), http://taolinux.org/
Information and Computer Science. Pioneer-1 Mobile Robot Data. University of California Irvine (November 1998), http://kdd.ics.uci.edu/databases/pioneer/pioneer.html
Information and Computer Science. COIL 1999 Competition Data, The UCI KDD Archive. University of California Irvine (October 1999), http://kdd.ics.uci.edu/databases/coil/coil.html
Information and Computer Science. Japanese Vowels. University of California Irvine (June 2000), http://kdd.ics.uci.edu/databases/JapaneseVowels/JapaneseVowels.html
Julisch, K.: Clustering intrusion detection alarms to support root cause analysis. ACM Transactions on Information and System Security (TISSEC) 6(4), 443–471 (2003)
Article Google Scholar
Kargupta, H., Huang, W., Sivakumar, K., Johnson, E.: Distributed clustering using collective principal component analysis. Knowledge and Information Systems 3(4), 405–421 (2001)
Article Google Scholar
Klusch, M., Lodi, S., Moro, G.: Distributed clustering based on sampling local density estimates. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI 2003), pp. 485–490 (2003)
Google Scholar
Kudo, M., Toyama, J., Shimbo, M.: Multidimensional Curve Classification Using Passing-Through Regions. Pattern Recognition Letters (11–13), 1103–1111 (1999)
Google Scholar
Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000)
Chapter Google Scholar
Llyod, S.P.: Least squares quantization in pcm. IEEE Transactions on Information Theory IT-2, 129–137 (1982)
Article Google Scholar
Malkhi, D., Nisan, N., Pinkas, B., Sella, Y.: Fairplay – A Secure Two-Party Computation System. In: Proceedings of 13th USENIX Security Symposium, San Diego, CA, September 2004. USENIX, pp. 287–302 (2004)
Google Scholar
Marchette, D.: A statistical method for profiling network traffic. In: Workshop on Intrusion Detection and Network Monitoring, pp. 119–128 (1999)
Google Scholar
Merugu, S., Ghosh, J.: Privacy-preserving distributed clustering using generative models. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), pp. 211–218 (2003)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Naccache, D., Stern, J.: A new public key cryptosystem based on higher residues. In: Proceedings of the 5th ACM Conference on Computer and Communications Security (CCS), San Francisco, California (1998)
Google Scholar
Naor, M., Pinkas, B.: Oblivious transfer and polynomial evaluation. In: 31st Symposium on Theory of Computer Science, Atlanta, GA, May 1-4, pp. 245–254 (1999)
Google Scholar
Oliveira, S., Zaiane, O.R.: Privacy preserving clustering by data transformation. In: XVIII Simpósio Brasileiro de Bancos de Dados, 6-8 de Outubro (SBBD 2003), pp. 304–318 (2003)
Google Scholar
Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, p. 223. Springer, Heidelberg (1999)
Google Scholar
Pouget, F., Dacier, M.: Honeypot-based forensics. In: Proceedings Of AusCERT Asia Pacific Information technology Security Conference 2004 (AusCERT2004), Brisbane, Australia (May 2004)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Google Scholar
Rind, D.M., Kohane, I.S., Szolovits, P., Safran, C., Chueh, H.C., Barnett, G.O.: Maintaining the confidentiality of medical records shared over the internet and the world wide web. Annals of Internal Medicine 127(2) (July 1997)
Google Scholar
Rizvi, S.J., Harista, J.R.: Maintaining data privacy in association rule mining. In: Proceedings of 28th International Conference on Very Large Data Bases (VLDB), Hong Kong, August 20-23 (2002)
Google Scholar
Sun Microsystems. Sun Java Virutal Machine, version 1.5 (November 2004), http://java.sun.com/
Taylor, H.: Most people are “privacy pragmatists” who, while concerned about privacy, will sometimes trade it off for other benefits. The Harris Poll (17), March 19 (2003)
Google Scholar
Turow, J.: Americans and online privacy: The system is broken. Technical report, Annenberg Public Policy Center (June 2003)
Google Scholar
Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 23–26, pp. 217–228 (2002)
Google Scholar
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215 (2003)
Google Scholar
Yao, A.C.: How to generate and exchange secrets. In: 27th IEEE Symposium on Foundations of Computer Science, pp. 162–167 (1986)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Sciences Department, University of Wisconsin, Madison, WI, USA
Somesh Jha & Luis Kruger
Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
Patrick McDaniel

Authors

Somesh Jha
View author publications
You can also search for this author in PubMed Google Scholar
Luis Kruger
View author publications
You can also search for this author in PubMed Google Scholar
Patrick McDaniel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Tenologie dell’Informazione, Universita’ degli Studi di Milano, Via Bramante 65, 26013, Crema (CR), Italy
Sabrina de Capitani di Vimercati
Naval Research Laboratory, USA
Paul Syverson
Institute for Security in Distributed Applications, Hamburg University of Technology, 21071, Hamburg, Germany
Dieter Gollmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jha, S., Kruger, L., McDaniel, P. (2005). Privacy Preserving Clustering. In: di Vimercati, S.d.C., Syverson, P., Gollmann, D. (eds) Computer Security – ESORICS 2005. ESORICS 2005. Lecture Notes in Computer Science, vol 3679. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11555827_23

Download citation

DOI: https://doi.org/10.1007/11555827_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28963-0
Online ISBN: 978-3-540-31981-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Privacy Preserving Clustering

Abstract

Chapter PDF

Similar content being viewed by others

K-Means Clustering Using Homomorphic Encryption and an Updatable Distance Matrix: Secure Third Party Data Clustering with Limited Data Owner Interaction

Oblivious Sampling with Applications to Two-Party k-Means Clustering

Privacy Preserving Multi-server k-means Computation over Horizontally Partitioned Data

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Privacy Preserving Clustering

Abstract

Chapter PDF

Similar content being viewed by others

K-Means Clustering Using Homomorphic Encryption and an Updatable Distance Matrix: Secure Third Party Data Clustering with Limited Data Owner Interaction

Oblivious Sampling with Applications to Two-Party k-Means Clustering

Privacy Preserving Multi-server k-means Computation over Horizontally Partitioned Data

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation