Advertisement

Cluster Computing

, Volume 17, Issue 4, pp 1383–1399 | Cite as

Privacy preserving sub-feature selection based on fuzzy probabilities

  • Hemanta Kumar Bhuyan
  • Narendra Kumar Kamila
Article

Abstract

The feature selection addresses the issue of developing accurate models for classification in data mining. The aggregated data collection from distributed environment for feature selection makes the problem of accessing the relevant inputs of individual data records. Preserving the privacy of individual data is often critical issue in distributed data mining. In this paper, it proposes the privacy preservation of individual data for both feature and sub-feature selection based on data mining techniques and fuzzy probabilities. For privacy purpose, each party maintains their privacy as the instruction of data miner with the help of fuzzy probabilities as alias values. The techniques have developed for own database of data miner in distributed network with fuzzy system and also evaluation of sub-feature value included for the processing of data mining task. The feature selection has been explained by existing data mining techniques i.e., gain ratio using fuzzy optimization. The estimation of gain ratio based on the relevant inputs for the feature selection has been evaluated within the expected upper and lower bound of fuzzy data set. It mainly focuses on sub-feature selection with privacy algorithm using fuzzy random variables among different parties in distributed environment. The sub-feature selection is uniquely identified for better class prediction. The algorithm provides the idea of selecting sub-feature using fuzzy probabilities with fuzzy frequency data from data miner’s database. The experimental result shows performance of our findings based on real world data set.

Keywords

Distributed data mining Fuzzy probabilities Privacy Feature selection 

References

  1. 1.
    Rogati, M., Yang, Y.: High -performing feature selection for text classification. In: CIKM’02, ACM, McLean, 4–9 Nov (2002)Google Scholar
  2. 2.
    Azizi, A., Pourreza, H. R.: Efficient IRIS recognition through improvement of feature extraction and subset selection. Int. J. Comput. Sci. Infor. Sec. (IJCIS). 2, (1), (2009)Google Scholar
  3. 3.
    Uncu, O., Turksen, I.B.: A novel feature selection approach: combining feature wrappers and filters. Infor. Sci. 177(2), 449–466 (2007)CrossRefMATHMathSciNetGoogle Scholar
  4. 4.
    Xia, H., Hu, B.Q.: Feature selection using fuzzy support vector machines. Fuzzy Optim. Decis. Mak. 5(2), 187–192 (2006)CrossRefMATHGoogle Scholar
  5. 5.
    Jensen, R., Shen, Q.: Fuzzy-rough sets assisted attribute selection. IEEE Trans. Fuzzy Syst. 15(1), 73–89 (2007)CrossRefGoogle Scholar
  6. 6.
    Rezaee, M. R., Goedhart, B., Lelieveldt, B. P. F., Reiber\(,\) J. H. C.: Fuzzy feature selection. Pattern Recognit. 32, 2011–2019 (1999)Google Scholar
  7. 7.
    Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)CrossRefGoogle Scholar
  8. 8.
    Bhuyan, H. K., Kamila, N. K., Mishra, M., Jena, S. S., Bhuyan, G.: Sub-feature selection with privacy in decentralized network based on fuzzy environment. In: Proceedings of CNC 2013, Chennai, India, pp. 19–26. LNICST, Chennai, 22–23 Feb (2013)Google Scholar
  9. 9.
    Wolf, R., Schuster, A.: Association rule mining in peer-to-peer systems. IEEE Trans. Syst. Man Cybern. Part B 34(6), 2426–2438 (2004)CrossRefGoogle Scholar
  10. 10.
    Bhaduri, K., Wolff, R., Gianella C., Kargupta, H.: Distributed Decision tree induction in peer-to-peer systems. Stat. Anal. Data Min. J. 1(2), 85–103, (2008)Google Scholar
  11. 11.
    Das, K., Bhaduri, K., Liu, K., Kargupta, H.: Distributed identification of Top-l inner products elements and it’s application in a peer-to-peer network. TKDE 20(4), 475–488 (2008)Google Scholar
  12. 12.
    Chen, R., Sivkumar, K., Kargupta, H.: Collective mining of Baysian networks from distributed heterogeneous data. Knowl. Inf. Syst. 6(2), 164–187 (2004)CrossRefGoogle Scholar
  13. 13.
    Al-Zaidy, R., Fung, B.C.M., Youssef, A.M., Fortin, F.: Mining criminal networks from unstructured text documents. Digit. Investig. 8(3—-4), 147–160 (2012)CrossRefGoogle Scholar
  14. 14.
    Nix, R., Kantarcioglu, M.: Incentive compatible privacy-preserving distributed classification. IEEE Trans. Dependable Secure Comput. 9(4), 451–462 (2012)Google Scholar
  15. 15.
    Clifton, C., Kantarcioglu, M., Lin, X., Vaidya, J., Zhu, M.: Tools for privacy preserving distributed data mining. SIGKDD Explor. 4(2), 28–34 (2003)CrossRefGoogle Scholar
  16. 16.
    Kargupta, H., Das, K., Liu, K.: Multiparty, privacy preserving distributed data mining using game theoretic framework. In: Proceedings of PKDD’07, pp. 523–531. Warsaw (2007)Google Scholar
  17. 17.
    Zhou, B., Pei, J.: The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl. Inf. Syst. 28(1), 47–77 (2011)Google Scholar
  18. 18.
    Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4), 14 (2010)Google Scholar
  19. 19.
    Kaleli, C., Polat, H.: Privacy-preserving SOM-based recommendations on horizontally distributed data. Knowl.-Based Syst. 33, 124–135 (2012)Google Scholar
  20. 20.
    Bhuyan, H. K., Kamila N. K., Dash, S. K.: An approach for privacy preservation of distributed data in peer-to-peer network using multiparty computation. Int. J. Comput. Sci. Issues (IJCSI). 8(4), 2 (2011)Google Scholar
  21. 21.
    Diamantini, C., Gemelli, A., Potena, D.: Feature ranking based on decision border. In: International conference on pattern recognition, IEEE Computer Society (2010)Google Scholar
  22. 22.
    Das, K., Bhaduri, K., Kargupta, H.: A local asynchronous distributed privacy preserving feature selection algorithm for large peer to peer networks. Knowl. Inf. Syst. 24(3), 341–367 (2014)CrossRefGoogle Scholar
  23. 23.
    Sun, H. J., Sun, M., Mei, Z.: Feature selection via fuzzy clustering. In: Proceedings of International Conference on Machine Learning and Cybernetics, pp. 1400–1405. (2006)Google Scholar
  24. 24.
    Zhang, Y., Wu, X.B., Xiang, Z.R., Hu, W.L.: Design of high dimensional fuzzy classification systems based on multi-objective evolutionary algorithm. J. Syst. Simul. 19(1), 210–215 (2007)Google Scholar
  25. 25.
    Xiong, N., Funk, P.: Construction of fuzzy knowledge bases incorporating feature selection. Soft Comput. 10(9), 796–804 (2006)CrossRefGoogle Scholar
  26. 26.
    Couso, I., L. Sánchez, L.: Higher order models for fuzzy random variables. Fuzzy Sets Syst. 159, 237–258 (2008)Google Scholar
  27. 27.
    Couso, I., Sánchez, L.: Upper and lower probabilities induced by a fuzzy random variable. Fuzzy Sets Syst. 165, 1–23 (2011)CrossRefMATHGoogle Scholar
  28. 28.
    Jesus, M.J.D., Hoffmann, F., Junco, L., S’anchez, L.: Induction of fuzzy rule based classifiers with evolutionary boosting algorithms. IEEE Trans. Fuzzy Sets Syst. 12(3), 296–308 (2004)CrossRefGoogle Scholar
  29. 29.
    S’anchez, L., Couso, I., Casillas, J.: Modelling vague data with genetic fuzzy systems under a combination of crisp and imprecise criteria. In: Proceedings of IEEE MCDM, Honolulu (2007)Google Scholar
  30. 30.
    S’anchez, L., Otero, J., Villar. J. R.: Learning fuzzy linguistic models from low quality data by genetic algorithms. In: FUZZ-IEEE, London. (2007)Google Scholar
  31. 31.
    Kwakernaak, H.: Fuzzy random variable-I. Definition and Theorem. Inf. Sci. 15, 1–29 (1978)CrossRefMATHMathSciNetGoogle Scholar
  32. 32.
    Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addision-Wesley, Redwood (2006)Google Scholar
  33. 33.
    Han, J., Kamber, M.: Data Mining Concepts and Techniques, 2nd edn. Elsevier, Morgan Kaufmann Publishers, San Francisco (2006)MATHGoogle Scholar
  34. 34.
    Agrawal, R., Srikant, R.: Privacy preserving data mining. In Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 439–450. Dallas (2000)Google Scholar
  35. 35.
    Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 37–48. Baltimroe (2005)Google Scholar
  36. 36.
    Li, Y., Chen, M., Li, Q., Zhang, W.: Enabling multilevel trust in privacy preserving data mining. IEEE Trans. Knowl. Data Eng. 24(9), 1598–1612 (2012)Google Scholar
  37. 37.
    Sanchez, L., Suarez, M.R., Couso, I.: A fuzzy definition of mutual information with application to the design of genetic fuzzy classifiers. In: International Conference on Machine Intelligence, pp. 5–7. Tozeur (2005)Google Scholar
  38. 38.
    Bacardit, J.: Pittsburgh generic based machine learning in the data mining era: representations, generalization, and run time. Ph.D. Thesis. La Salle-Univ. Ramon Llull (2005)Google Scholar
  39. 39.
    Sanchez, L., Suarez, M.R., Villar, J.R., Couso, I.: Some results about Mutual information based feature selection and fuzzy Discretization of vague data. In: IEEE, Fuzzy Systems Conference, FUZZ-IEEE 2007, pp 1–6. London, 23–26 July (2007)Google Scholar
  40. 40.
    Asuncion, A., Newman, D.: UCI machine learning repository, (2007)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringMahavir Institute of Engineering and TechnologyOdishaIndia
  2. 2.Department of Computer Science and EngineeringC. V. Raman College of EngineeringOdishaIndia

Personalised recommendations