Advertisement

Privacy Preserving Nearest Neighbor Search

  • Mark Shaneck
  • Yongdae Kim
  • Vipin Kumar
Chapter

Data mining is frequently obstructed by privacy concerns. In many cases data is distributed, and bringing the data together in one place for analysis is not possible due to privacy laws (e.g. HIPAA) or policies. Privacy preserving data mining techniques have been developed to address this issue by providing mechanisms to mine the data while giving certain privacy guarantees. In this chapter we address the issue of privacy preserving nearest neighbor search, which forms the kernel of many data mining applications. To this end, we present a novel algorithm based on secure multiparty computation primitives to compute the nearest neighbors of records in horizontally distributed data. We show how this algorithm can be used in three important data mining algorithms, namely LOF outlier detection, SNN clustering, and kNN classification. We prove the security of these algorithms under the semi-honest adversarial model, and describe methods that can be used to optimize their performance. Keywords: Privacy Preserving Data Mining, Nearest Neighbor Search, Outlier Detection, Clustering, Classification, Secure Multiparty Computation

Keywords

Outlier Detection Neighbor Search Query Point Association Rule Mining Privacy Preserve 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Dakshi Agrawal and Charu C. Aggarwal. On the Design and Quantification of Privacy Preserving Data Mining Algorithms. In Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2001.Google Scholar
  2. [2]
    Rakesh Agrawal and Ramakrishnan Srikant. Privacy-Preserving Data Mining. In Proceedings of the ACM International Conference on Management of Data, 2000.Google Scholar
  3. [3]
    Sunil Arya and David M. Mount and Nathan S. Netanyahu and Ruth Silverman and Angela Y. Wu. An Optimal Algorithm for Approximate Nearest Neighbor Searching Fixed Dimensions. Journal of the ACM, 45(6):891-923, 1998.MATHCrossRefMathSciNetGoogle Scholar
  4. [4]
    Mikhail Atallah and Marina Bykova and Jiangtao Li and Keith Frikken and Mercan Topkara. Private Collaborative Forecasting and Benchmarking. In Proceedings of the Workshop on Privacy in the Electronic Society, 2004.Google Scholar
  5. [5]
    M. M. Breunig and H.-P. Kriegel and R. T. Ng and J. Sander. LOF: Identifying Density-Based Local Outliers. In Proceedings of the ACM International Conference on Management of Data, 2000.Google Scholar
  6. [6]
    M. M. Breunig and H.-P. Kriegel and R. T. Ng and J. Sander. OPTICS-OF: Identifying Local Outliers. In Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery, 1999.Google Scholar
  7. [7]
    Subramanyam Chitti and Li Xiong and Ling Liu. Mining Multiple Private Databases using a Privacy Preserving kNN Classifier. Technical report, Georgia Tech, 2004.Google Scholar
  8. [8]
    Chris Clifton and Murat Kantarcioglou and Jaideep Vaidya and Xiaodong Lin and Michael Zhu. Tools for Privacy Preserving Data Mining. SIGKDD Explorations, 2002.Google Scholar
  9. [9]
    T.M. Cover and P.E. Hart. Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 13(1):21-27, 1967.MATHCrossRefGoogle Scholar
  10. [10]
    Paul Dokas and Levent Ertöz and Vipin Kumar and Aleks Lazarevic and Jaideep Srivastava and Pang-Ning Tan. Data Mining for Network Intrusion Detection. In Proceedings of the NSF Workshop on Next Generation Data Mining, 2002.Google Scholar
  11. [11]
    Wenliang Du and Mikhail Atallah. Privacy-Preserving Cooperative Statistical Analysis. In Proceedings of the 17th Annual Computer Security Applications Conference, 2001.Google Scholar
  12. [12]
    Levent Ertöz and Michael Steinbach and Vipin Kumar. A New Shared Nearest Neighbor Clustering Algorithm and its Applications. Workshop on Clustering High Dimensional Data and its Applications, In Proceedings of Text Mine '01, First SIAM International Conference on Data Mining, 2001.Google Scholar
  13. [13]
    Levent Ertöz and Michael Steinbach and Vipin Kumar. Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data. In Proceedings of the SIAM International Conference on Data Mining, 2003.Google Scholar
  14. [14]
    M. Ester and H.-P. Kriegel and J. Sander and X. Xu. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the International Conference on Knowledge Discovery and Data Mining, 1996.Google Scholar
  15. [15]
    Alexandre Evfimievski and Ramakrishnan Srikant and Rakesh Agrawal and Johannes Gehrke. Privacy Preserving Mining of Association Rules. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, 2002.Google Scholar
  16. [16]
    Jinliang Fan and Jun Xu and Mostafa H. Ammar and Sue B. Moon. Prefix-Preserving IP Address Anonymization: Measurement-Based Security Evaluation and a New Cryptography-Based Scheme. The International Journal of Computer and Telecommunications Networking, 46(2):253-272, 2004.MATHGoogle Scholar
  17. [17]
    Marc Fischlin. A Cost-Effective Pay-Per-Multiplication Comparison Method for Millionaires. RSA Security 2001 Cryptographer's Track and Lecture Notes in Computer Science, pages 457-471, 2001.Google Scholar
  18. [18]
    1996 Health Insurance Portability and Accountability Act. http://www.hhs.gov/ocr/hipaa/, 1996.
  19. [19]
    Bart Goethals and Sven Laur and Helger Lipmaa and Taneli Mielikëinen. On Private Scalar Product Computation for Privacy-Preserving Data Mining. In Proceedings of the 7th Annual International Conference in Information Security and Cryptology, 2004.Google Scholar
  20. [20]
    Oded Goldreich. Secure Multiparty Computation, manuscript. http://www.wisdom.weizmann.ac.il/ oded/PSBookFrag/prot.ps, 2003.
  21. [21]
    Piotr Indyk and Rajeev Motwani. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of the Symposium on Theory of Computing, 1998.Google Scholar
  22. [22]
    Piotr Indyk and David Woodruff. Polylogarithmic Private Approximations and Efficient Matching. Proceedings of the Theory of Cryptography Conference, 2006.Google Scholar
  23. [23]
    Ioannis Ioannidis and Ananth Grama. An Efficient Protocol for Yao's Millionaire's Problem. In Proceedings of the Hawaii International Conference on System Sciences, 2003.Google Scholar
  24. [24]
    Vijay Iyengar. Transforming Data to Satisfy Privacy Constraints. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, 2002.Google Scholar
  25. [25]
    Geetha Jagannathan and Rebecca N. Wright. Privacy-Preserving Distributed k-Means Clustering over Arbitrarily Partitioned Data. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, 2005.Google Scholar
  26. [26]
    R. A. Jarvis and E. A. Patrick. Clustering Using a Similarity Measure Based on Shared Nearest Neighbors. IEEE Transactions on Computers, C22(11):1025-1034, 1973.Google Scholar
  27. [27]
    Somesh Jha and Louis Kruger and Patrick McDaniel. Privacy Preserving Clustering. In Proceedings of the 10th European Symposium On Research In Computer Security, 2005.Google Scholar
  28. [28]
    Murat Kantarcioglu and Chris Clifton. Privacy-preserving Distributed Mining of Association Rules on Horizontally Partitioned Data. In Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2002.Google Scholar
  29. [29]
    Murat Kantarcioglu and Chris Clifton. Privately Computing a Distributed k-nn Classifier. In Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, 2004.Google Scholar
  30. [30]
    Hillol Kargupta and Souptik Datta and Qi Wang and Krishnamoorthy Sivakumar. Random Data Perturbation Techniques and Privacy Preserving Data Mining. Knowledge and Information Systems Journal, 7(4), 2005.Google Scholar
  31. [31]
    Eike Kiltz and Ivan Damgaard and Matthias Fitzi and Jesper Buus Nielsen and Tomas Toft. Unconditionally Secure Constant Round Multi-Party Computation for Equality, Comparison, Bits and Exponentiation. In Proceedings of the third Theory of Cryptography Conference, 2006.Google Scholar
  32. [32]
    Lea Kissner and Dawn Song. Privacy Preserving Set Operations. In Proceedings of Advances in Cryptology - CRYPTO, 2005.Google Scholar
  33. [33]
    Eyal Kushilevitz and Rafail Ostrovsky and Yuval Rabani. Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces. In Proceedings of the 30th Annual ACM Symposium on the Theory of Computing, 1998.Google Scholar
  34. [34]
    Sven Laur and Helger Lipmaa. On Private Similarity Search Protocols. Proceedings of the 9th Nordic Workshop on Secure IT Systems, 2004.Google Scholar
  35. [35]
    Yehuda Lindell and Benny Pinkas. Privacy Preserving Data Mining. In Proceedings of Advances in Cryptology - CRYPTO, 2000.Google Scholar
  36. [36]
    Benjamin Mayer and Huzefa Rangwala and Rohit Gupta and Jaideep Srivastava and George Karypis and Vipin Kumar and Piet de Groen. Feature Mining for Prediction of Degree of Liver Fibrosis. Poster Presentation in the Annual Symposium of American Medical Informatics Association, 2005.Google Scholar
  37. [37]
    Srujana Merugu and Joydeep Ghosh. Privacy-preserving Distributed Clustering using Generative Models. In Proceedings of The Third IEEE International Conference on Data Mining, 2003.Google Scholar
  38. [38]
    A. Meyerson and R. Williams. On the Complexity of Optimal KAnonymity. In Proceedings of the Twenty-third ACM Symposium on Principles of Database Systems, 2004.Google Scholar
  39. [39]
    Greg Minshall. TCPdpriv Command Manual. 1996.Google Scholar
  40. [40]
    Stanley Oliveira and Osmar Zaïane. Achieving Privacy Preservation When Sharing Data For Clustering. In Proceedings of the International Workshop on Secure Data Management in a Connected World, 2004.Google Scholar
  41. [41]
    Stanley Oliveira and Osmar Zaïane. Privacy Preserving Clustering By Data Transformation. In Proceedings of the 18th Brazilian Symposium on Databases, 2003.Google Scholar
  42. [42]
    Stanley Oliveira and Osmar Zaïane. Privacy-Preserving Clustering by Object Similarity-Based Representation and Dimensionality Reduction Transformation. In Proceedings of the Workshop on Privacy and Security Aspects of Data Mining, 2004.Google Scholar
  43. [43]
    Pascal Paillier. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In Proceedings of Eurocrypt, 1999.Google Scholar
  44. [44]
    Shariq Rizvi and Jayant Haritsa. Maintaining Data Privacy in Association Rule Mining. In Proceedings of 28th International Conference on Very Large Data Bases, 2002.Google Scholar
  45. [45]
    Pierangela Samarati and Latanya Sweeney. Protecting Privacy when Disclosing Information: k-Anonymity and Its Enforcement through Generalization and Suppression. Technical report, SRI International, 1998.Google Scholar
  46. [46]
    Latanya Sweeney. k-Anonymity: A Model for Protecting Privacy. International Journal on Uncertainty, Fuzziness, and Knowledge-based Systems, 2002.Google Scholar
  47. [47]
    Pang-Ning Tan and Michael Steinbach and Vipin Kumar. Introduction to Data Mining. Pearson Education, Inc., 2006.Google Scholar
  48. [48]
    Jaideep Vaidya and Chris Clifton. Privacy-Preserving Association Rule Mining in Vertically Partitioned Data. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, 2002.Google Scholar
  49. [49]
    Jaideep Vaidya and Chris Clifton. Privacy-Preserving Decision Trees over Vertically Partitioned Data. In Proceedings of the IFIP WG 11.3 International Conference on Data and Applications Security, 2005.Google Scholar
  50. [50]
    Jaideep Vaidya and Chris Clifton. Privacy-Preserving K-Means Clustering over Vertically Partitioned Data. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, 2003.Google Scholar
  51. [51]
    Jaideep Vaidya and Chris Clifton. Privacy-Preserving Outlier Detection. In Proceedings of the Fourth IEEE International Conference on Data Mining, 2004.Google Scholar
  52. [52]
    A. C. Yao. How to Generate and Exchange Secrets. In Proceedings of the 27th IEEE Symposium on Foundations of Computer Science, 1986.Google Scholar
  53. [53]
    Justin Zhan and LiWu Chang and Stan Matwin. Privacy Preserving K-nearest Neighbor Classification. International Journal of Network Security, 1(1):46-51, 2005.Google Scholar

Copyright information

© Springer-Verlag US 2009

Authors and Affiliations

  1. 1.Liberty UniversityLynchburgVA
  2. 2.University of MinnesotaMinneapolisMN

Personalised recommendations