Abstract
Document clustering is a powerful data mining technique to analyze the large amount of documents and structure large sets of text or hypertext documents. Many organizations or companies want to share their documents in a similar theme to get the joint benefits. However, it also brings the problem of sensitive information leakage without consideration of privacy. In this paper, we propose a cryptography-based framework to do the privacy-preserving document clustering among the users under the distributed environment: two parties, each having his private documents, want to collaboratively execute agglomerative document clustering without disclosing their private contents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Boneh, D., et al.: Public Key Encryption with Keyword Search. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 506–522. Springer, Heidelberg (2004)
Beil, F., Ester, M., Xu, X.: Frequent Term-Based Text Clustering. In: Proceedings of the 8th Int. Conf. on Knowledge Discovery and Data Mining (KDD) (2002)
Cutting, D.R., et al.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. Proc. ACM SIGIR 92, 318–329 (1992)
Damgård, I., Jurik, M.: Client/server Tradeoffs for Online Elections. In: Naccache, D., Paillier, P. (eds.) PKC 2002. LNCS, vol. 2274, pp. 125–140. Springer, Heidelberg (2002)
Feigenbaum, J., et al.: Secure multiparty computation of approximations. ACM Transactions on Algorithms 2, 435–472 (2006)
Freedman, M.J., et al.: Keyword Search and Oblivious Pseudorandom Functions. In: Kilian, J. (ed.) TCC 2005. LNCS, vol. 3378, Springer, Heidelberg (2005)
Goldreich, O.: Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge (2004)
Goldreich, O., Micali, S., Wigderson, A.: How To Play Any Mental Game. In: Proceedings of the 19th annual ACM symposium on Theory of computing (1987)
Jagannathan, G., Wright, R.: Privacy-Preserving Distributed k-Means Clustering over Arbitrarily Partitioned Data. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2005)
Laur, S., Lipmaa, H., Mielikainen, T.: Private Itemset Support Counting. In: Qing, S., et al. (eds.) ICICS 2005. LNCS, vol. 3783, pp. 97–111. Springer, Heidelberg (2005)
Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000)
Ogata, W., Kurosawa, K.: Oblivious Keyword Search. Journal of Complexity 20(2-3), 356–371 (2004)
Paillier, P.: Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, Springer, Heidelberg (1999)
Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. In: KDD Workshop on Text Mining (2000)
Song, D., Wagner, D., Perrig, A.: Practical Techniques for Searches on Encrypted Data. In: Proc. of the 2000 IEEE Security and Privacy Symposium (May 2000)
Vaidya, J., Clifton, C.: Privacy-Preserving K-Means Clustering Over Vertically Partitioned Data. In: Proc. of the 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, Washington (2003)
Yao, A.C.: Protocols for Secure Computation. In: 23rd FOCS (1982)
Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: Proc. of 21st ACM SIGIR on Research and Development in Information Retrieval, Melbourne, Australia, 1998, pp. 46–54 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Su, C., Zhou, J., Bao, F., Takagi, T., Sakurai, K. (2007). Two-Party Privacy-Preserving Agglomerative Document Clustering. In: Dawson, E., Wong, D.S. (eds) Information Security Practice and Experience. ISPEC 2007. Lecture Notes in Computer Science, vol 4464. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72163-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-72163-5_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72159-8
Online ISBN: 978-3-540-72163-5
eBook Packages: Computer ScienceComputer Science (R0)