Abstract
Publishing data about individuals without revealing sensitive information about them is an important problem. Distributed data mining applications use sensitive data from distributed databases held by different parties. This comes into direct conflict with an individual’s need and right to privacy. It is thus of great importance to develop adequate security techniques for protecting privacy of individual values used for data mining. Here, we study how to maintain privacy in distributed data mining. That is, we study how two (or more) parties can find frequent itemsets in a distributed database without revealing each party’s portion of the data to the other. In this paper, we consider privacy-preserving naïve-Bayes classifier for horizontally partitioned distributed data and propose data mining privacy by decomposition (DMPD) method that uses genetic algorithm to search for optimal feature set partitioning by classification accuracy and k-anonymity constraints.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kargupta, H., Chan, P.: Advances in Distributed and Parallel Knowledge Discovery. MIT, AAAI Press, Cambridge, New York (2000)
Vaidya, J., Clifton, C.: Privacy-preserving data mining: Why, how and when. IEEE Security and Privacy, 19–27 (November/December 2004)
Evfimievski, A., Ramakrishnan, S., Agrawal, R., Gehrke, J.: Privacy- preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada (July 2002)
Kantarcioglu, M., Vaidya, J.: Privacy preserving naive Bayes classifier for horizontally partitioned data. In: Proceedings of IEEE Workshop on Privacy Preserving Data Mining (2003)
Vaidya, J., Clifton, C.: Privacy-preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 639–644. ACM Press, New York (2002)
Verykios, V.S., Elmagarmid, A.K., Bertino, E., Saygin, Y., Dasseni, E.: Association rule hiding. IEEE Transactions on Knowledge and Data Engineering 16(4), 434–447 (2004)
Rizvi, S.J., Haritsa, J.R.: Maintaining data privacy in association rule mining. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 682–693 (2002)
Clifton, C., Kantarcioglou, M., Lin, X., Zhu, M.Y.: Tools for privacy preserving distributed data mining. SIGKDD Exploration 4(2), 1–7 (2002)
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. ACM Press, New York (2003)
Kantarcioglu, M., Vaidya, J.: Privacy-preserving naive Bayes classifier for horizontally partitioned data. In: IEEE Workshop on Privacy Preserving Data Mining (2003)
Vaidya, J., Clifton, C.: Privacy preserving naive Bayes classifier on vertically partitioned data. In: 2004 SIAM International Conference on Data Mining (2004)
Wright, R., Yang, Z.: Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. In: KDD 2004, Seattle, Washington, USA (August 2004)
Yang, Z., Zhong, S., Wright, R.: Privacy-preserving classification of customer data without loss of accuracy. In: Proceedings of the 5th SIAM International Conference on Data Mining, Newport Beach, CA (April 2005)
Alpaydin, E.: Combined 5 _ 2 CV F-test for comparing supervised classification learning classifiers. Neural Computation 11, 1975–1982 (1999)
Cohen, S., Rokach, L., Maimon, O.: Decision-tree instance-space decomposition with grouped gain-ratio. Information Sciences 177(17), 3592–3612 (2007)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: An overview. In: Advances in Knowledge Discovery and Data Mining, pp. 1–31. AAAI Press, Menlo Park (1996)
Fonseca, C.M., Fleming, P.J.: Genetic algorithms for multiobjective optimization: Formulation, discussion and generalization. In: Forrest, S. (ed.) Proc. of the Fifth International Conference on Genetic Algorithms, pp. 416–423. Morgan Kaufmann, San Mateo (1993)
Friedman, A., Schuster, R.W.: Providing k-anonymity in data mining. VLDB 17(4), 789–804 (2008)
Fung, B.C.M., Wang, K., Yu, P.S.: Anonymizing classification data for privacy preservation. IEEE Transactions on Knowledge and Data Engineering 19(5), 711–725 (2007)
Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proc. of the 21st IEEE International Conference on Data Engineering, ICDE 2005, pp. 205–216. IEEE Computer Society, Washington, DC (2005)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Boston (1989)
Jones, D.F., Mirrazavi, S.K., Tamiz, M.: Multiobjective meta-heuristics: An overview of the current state-of-the-art. European Journal of Operational Research 137(1), 1–9 (2002)
Kim, S.W., Park, S., Won, J.I., Kim, A.W.: Privacy preserving data mining of sequential patterns for network traffic data. Information Sciences 178(3), 694–713 (2008)
Konaka, D.W., Coitb, A.E.: Smithc, Multi-objective optimization using genetic algorithms: A tutorial. Reliability Engineering and System Safety 91, 992–1007 (2006)
Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1996)
Meints, M., Moller, J.: Privacy preserving data mining – a process centric view from a European perspective (2004), http://www.fidis.net
Sharpe, P.K., Glover, R.P.: Efficient GA based techniques for classification. Applied Intelligence 11, 277–284 (1999)
Zhang, J., Zhuang, J., Du, H., Wang, S.: Self-organizing genetic algorithm based tuning of PID controllers. Information Sciences 179(7), 1007–1018 (2009)
Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms: Empirical results. Evolutionary Computation 8(2), 173–195 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Jena, L., Kamila, N.K., Mishra, S. (2014). Privacy Preserving Distributed Data Mining with Evolutionary Computing. In: Satapathy, S., Udgata, S., Biswal, B. (eds) Proceedings of the International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2013. Advances in Intelligent Systems and Computing, vol 247. Springer, Cham. https://doi.org/10.1007/978-3-319-02931-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-02931-3_29
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02930-6
Online ISBN: 978-3-319-02931-3
eBook Packages: EngineeringEngineering (R0)