Abstract
BIRCH algorithm, introduced by Zhang et al. [15], is a well known algorithm for effectively finding clusters in a large data set. The two major components of the BIRCH algorithm are CF tree construction and global clustering. However BIRCH algorithm is basically designed as an algorithm working on a single database. We propose the first novel method for running BIRCH over a vertically partitioned data sets, distributed in two different databases in a privacy preserving manner. We first provide efficient solutions to crypto primitives such as finding minimum index in a vector sum and checking if sum of two private values exceed certain threshold limit. We then use these primitives as basic tools to arrive at secure solutions to CF tree construction and single link clustering for implementing BIRCH algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, D., Aggarwal, C.C.: On the Design and Quantification of Privacy Preserving Data Mining Algorithms. In: Proceedings of the Twentieth ACM SIGACT - SIGMOD - SIGART Symposium on Principles of Database Systems, May 21-23, 2001, pp. 247–255. ACM, Santa Barbara (2001)
Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, Dallas, TX, May 14-19, 2000. ACM Press, New York (2000)
Goethals, B., Laur, S., Lipmaa, H., Mielikainen, T.: On private scalar product computation for privacy-preserving data mining. In: Park, C.-s., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 104–120. Springer, Heidelberg (2005)
Cachin, C.: Efficient private bidding and auctions with an oblivious third party. In: Proceedings of 6th ACM Computer and communications security, SIGSAC, pp. 120–127. ACM Press, New York (1999)
Damgard, I., Jurik, M.: A Generalisation, a Simplification and Some Applications of Paillier’s Probabilistic Public-Key System. In: Kim, K.-c. (ed.) PKC 2001. LNCS, vol. 1992, pp. 119–136. Springer, Heidelberg (2001)
Jagannathan, G., Pillaipakkamnatt, K., Wright, R.N.: A New Privacy-Preserving Distributed k-Clustering Algorithm. In: Proceedings of the 2006 SIAM International Conference on Data Mining (SDM) (2006)
Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005, Chicago, Illinois, USA, August 21-24, 2005. ACM, New York (2005)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data, ch. 3. Prentice-Hall Inc., Englewood Cliffs (1988)
Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000)
Natan, R.B.: Implementing Database Security and Auditing, ch. 11. Elsevier, Amsterdam (2005)
Oliveira, S., Zaiane, O.R.: Privacy preserving clustering by data transformation. In: Proceedings of the 18th Brazilian Symposium on Databases, pp. 304–318 (2003)
Paillier, P.: Public-key Cryptosystems Based on Composite Degree Residuosity Classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)
Rivest, R., Adleman, L., Dertouzos, M.: On data banks and privacy homomorphisms. In: Foundations of Secure Computation, pp. 169–178. Academic Press, London (1978)
Jha, S., Kruger, L., McDaniel, P.: Privacy Preserving Clustering. In: di Vimercati, S.d.C., Syverson, P.F., Gollmann, D. (eds.) ESORICS 2005. LNCS, vol. 3679, pp. 397–417. Springer, Heidelberg (2005)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient Data Clustering Method of Very Large Databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, pp. 103–114 (June 1996)
Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 23-26, 2002, pp. 639–644. ACM, New York (2002)
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the 9th ACM SIGKDD International Conference on knowledge Discovery and Data Mining, Washington, DC, USA, August 24-27, 2003. ACM, New York (2003)
Yao, A.C.: Protocols for secure computation. In: Proceedings of 23rd IEEE Symposium on Foundations of Computer Science, pp. 160–164. IEEE Computer Society Press, Los Alamitos (1982)
Yao, A.C.: How to generate and exchange secrets. In: Proceedings of the 27th IEEE Symp. on Foundations of Computer Science, Toronto, Ontario, Canada, October 27 - 29, 1986, pp. 162–167 (1986)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Prasad, P.K., Rangan, C.P. (2006). Privacy Preserving BIRCH Algorithm for Clustering over Vertically Partitioned Databases. In: Jonker, W., Petković, M. (eds) Secure Data Management. SDM 2006. Lecture Notes in Computer Science, vol 4165. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11844662_7
Download citation
DOI: https://doi.org/10.1007/11844662_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-38984-2
Online ISBN: 978-3-540-38987-3
eBook Packages: Computer ScienceComputer Science (R0)