Advertisement

Stable Feature Selection with Privacy Preserving Data Mining Algorithm

  • Mohana Chelvan PEmail author
  • Perumal K
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 712)

Abstract

Data mining extracts previously not known and valuable type of patterns and information procured from large storage of data that is archived. In the last few decades, the advancements in internet technologies results in enormous increase in the dimensionality of the dataset concerned with data mining. Feature selection is an important dimensionality reduction technique as it improves accuracy, efficiency and model interpretability of data mining algorithms. Selection of feature and its stability may be perceived to be the robustness of the algorithm for feature selection which helps selecting similar or the same subset of features for small perturbations in the dataset. The essential purpose of data mining that is used for the preservation of privacy is the modification of original datasets by means of a method to preserve privacy of the individuals and work out subsequent data mining algorithm to get information from it. This perturbation of the dataset will affect the feature selection stability. There will be a correlation between privacy preserving data mining and feature selection stability. This paper explores on this problem and also introduces a privacy preserving algorithm which has less impact on feature selection stability as well as accuracy.

Keywords

Data mining Privacy preservation Feature selection Selection stability Kuncheva Index 

References

  1. 1.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)CrossRefzbMATHGoogle Scholar
  2. 2.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  3. 3.
    Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Boston (1998)CrossRefzbMATHGoogle Scholar
  4. 4.
    Davis, C.A., Gerick, F., Hintermair, V., Friedel, C.C., Fundel, K., Küffner, R., Zimmer, R.: Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics 22(19), 2356–2363 (2006)CrossRefGoogle Scholar
  5. 5.
    Hall, M.A.: Correlation-based feature selection for machine learning. Department of Computer Science, University of Waikato (1998). http://www.cs.waikato.ac.nz/~mhall/thesis.pdf
  6. 6.
    Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007)CrossRefGoogle Scholar
  7. 7.
    He, Z., Yu, W.: Stable feature selection for biomarker discovery (2010)Google Scholar
  8. 8.
    Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms, p. 8, November 2005Google Scholar
  9. 9.
    Alelyani, S., Liu, H.: The effect of the characteristics of the dataset on the selection stability. In: IEEE International Conference on Tools with Artificial Intelligence (2011). doi: 10.1109/ICTAI.2011.167. 1082-3409/11
  10. 10.
    Alelyani, S., Zhao, Z., Liu, H.: A dilemma in assessing stability of feature selection algorithms. In: IEEE International Conference on High Performance Computing and Communications (2011). doi: 10.1109/HPCC.2011.99. ISBN 978-0-7695-4538-7/11
  11. 11.
    Alelyani, S.: On feature selection stability: a data perspective. Doctoral dissertation, Arizona State University, AZ, USA. ACM Digital Library (2013). ISBN 978-1-303-02654-6Google Scholar
  12. 12.
    Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study high-dimensional spaces. Knowl. Inf. Syst. 12, 95–116 (2007)CrossRefGoogle Scholar
  13. 13.
    Veryhios, V.S., Bertino, E., Fovino, I.N., Provenza, L.P., Saygin, Y., Theodoridis, Y.: State-of-the-art in privacy preserving data mining. SIGMOD Rec. 33(1) (2004)Google Scholar
  14. 14.
    Xiniun, Q., Zong, M.: An overview of privacy preserving data mining. Procedia Environ. Sci.12 (2012). doi: 10.1016/j.proenv.2012.01.432. ISSN 1878-0296
  15. 15.
    Agarwal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the ACM SIGMOD Conference of Management of Data, pp. 439–450. ACM Press, May 2000Google Scholar
  16. 16.
    Ciriani, V., De Capitani di Vimercati, S., Foresti, S., Samarati, P.: Micro data protection. In: Yu, T., Jajodia, S. (eds.) Secure Data Management in Decentralized Systems. Advances in Information Security, vol. 33, pp. 291–321. Springer, Heidelberg (2007). doi: 10.1007/978-0-387-27696-0_9 CrossRefGoogle Scholar
  17. 17.
    Hall, M.A., Smith, L.A.: Practical feature subset selection for machine learning. In: McDonald, C. (ed.) Proceedings of the 21st Australian Computer Science Conference, pp. 181–191. Springer, Heidelberg (1998)Google Scholar
  18. 18.
    Kuncheva, L.I.: A stability index for feature selection. In: Proceedings of the 25th Conference on IASTED International Multi Conference: Artificial Intelligence and Applications, Anaheim, CA, USA, pp. 390–395. ACTA Press (2007)Google Scholar
  19. 19.
    Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Log. Soft Comput. 17(2), 255–287 (2010)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceHindustan College of Arts and ScienceChennaiIndia
  2. 2.Department of Computer ApplicationsMadurai Kamaraj UniversityMaduraiIndia

Personalised recommendations