Advertisement

A New Fuzzy-Rough Hybrid Merit to Feature Selection

  • Javad Rahimipour AnarakiEmail author
  • Saeed Samet
  • Wolfgang Banzhaf
  • Mahdi Eftekhari
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10020)

Abstract

Feature selecting is considered as one of the most important pre-process methods in machine learning, data mining and bioinformatics. By applying pre-process techniques, we can defy the curse of dimensionality by reducing computational and storage costs, facilitate data understanding and visualization, and diminish training and testing times, leading to overall performance improvement, especially when dealing with large datasets. Correlation feature selection method uses a conventional merit to evaluate different feature subsets. In this paper, we propose a new merit by adapting and employing of correlation feature selection in conjunction with fuzzy-rough feature selection, to improve the effectiveness and quality of the conventional methods. It also outperforms the newly introduced gradient boosted feature selection, by selecting more relevant and less redundant features. The two-step experimental results show the applicability and efficiency of our proposed method over some well known and mostly used datasets, as well as newly introduced ones, especially from the UCI collection with various sizes from small to large numbers of features and samples.

Keywords

Feature selection Fuzzy-rough dependency degree Correlation merit 

Notes

Acknowledgments

This work has been partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Research & Development Corporation of Newfoundland and Labrador (RDC).

References

  1. 1.
    Hall, M.A., Smith, L.A.: Feature subset selection: a correlation based filter approach. In: Proceedings of the 1997 International Conference on Neural Information Processing and Intelligent Information Systems, New Zealand, pp. 855–858 (1997)Google Scholar
  2. 2.
    Javed, K., Babri, H.A., Saeed, M.: Feature selection based on class-dependent densities for high-dimensional binary data. IEEE Trans. Knowl. Data Eng. 24, 465–477 (2012)CrossRefGoogle Scholar
  3. 3.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)CrossRefzbMATHGoogle Scholar
  4. 4.
    Das, S.: Filters, wrappers and a boosting-based hybrid for feature selection. In: ICML, vol. 1, pp. 74–81. Citeseer (2001)Google Scholar
  5. 5.
    Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: AAAI, pp. 129–134 (1992)Google Scholar
  6. 6.
    Jensen, R., Shen, Q.: New approaches to fuzzy-rough feature selection. IEEE Trans. Fuzzy Syst. 17, 824–838 (2009)CrossRefGoogle Scholar
  7. 7.
    Anaraki, J.R., Eftekhari, M., Ahn, C.W.: Novel improvements on the fuzzy-rough quickreduct algorithm. IEICE Trans. Inf. Syst. E98.D(2), 453–456 (2015)CrossRefGoogle Scholar
  8. 8.
    Anaraki, J.R., Eftekhari, M.: Improving fuzzy-rough quick reduct for feature selection. In: 2011 19th Iranian Conference on Electrical Engineering (ICEE), pp. 1502–1506 (2011)Google Scholar
  9. 9.
    Qian, Y., Wang, Q., Cheng, H., Liang, J., Dang, C.: Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst. 258, 61–78 (2015). Special issue: Uncertainty in Learning from Big DataMathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Jensen, R., Vluymans, S., Parthaláin, N.M., Cornelis, C., Saeys, Y.: Semi-supervised fuzzy-rough feature selection. In: Yao, Y., Hu, Q., Yu, H., Grzymala-Busse, J.W. (eds.) RSFDGrC 2015. LNCS (LNAI), vol. 9437, pp. 185–195. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-25783-9_17 CrossRefGoogle Scholar
  11. 11.
    Shang, C., Barnes, D.: Fuzzy-rough feature selection aided support vector machines for mars image classification. Comput. Vis. Image Underst. 117, 202–213 (2013)CrossRefGoogle Scholar
  12. 12.
    Derrac, J., Verbiest, N., García, S., Cornelis, C., Herrera, F.: On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection. Soft Comput. 17, 223–238 (2012)CrossRefGoogle Scholar
  13. 13.
    Dai, J., Xu, Q.: Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl. Soft Comput. 13, 211–221 (2013)CrossRefGoogle Scholar
  14. 14.
    Xu, Z., Huang, G., Weinberger, K.Q., Zheng, A.X.: Gradient boosted feature selection. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 522–531. ACM (2014)Google Scholar
  15. 15.
    Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11, 341–356 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A.: Rough sets: a tutorial. In: Pal, S.K., Skowron, A. (eds.) Rough-Fuzzy Hybridization: A New Trend in Decision Making, pp. 3–98. Springer-Verlag New York, Inc., Secaucus (1998)Google Scholar
  18. 18.
    Radzikowska, A.M., Kerre, E.E.: A comparative study of fuzzy rough sets. Fuzzy Sets Syst. 126, 137–155 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Boln-Canedo, V., Snchez-Maroo, N., Alonso-Betanzos, A.: Feature Selection for High-Dimensional Data. Springer, Switzerland (2016)Google Scholar
  20. 20.
    John, G.H., Kohavi, R., Pfleger, K., et al.: Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the Eleventh International Conference, pp. 121–129 (1994)Google Scholar
  21. 21.
    Kim, G., Kim, Y., Lim, H., Kim, H.: An mlp-based feature subset selection for HIV-1 protease cleavage site analysis. Artif. Intell. Med. 48, 83–89 (2010). Artificial Intelligence in Biomedical Engineering and InformaticsCrossRefGoogle Scholar
  22. 22.
    Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and regression trees. CRC Press, New York (1984)zbMATHGoogle Scholar
  23. 23.
    Wnek, J., Michalski, R.S.: Comparing symbolic and subsymbolic learning: three studies. Mach. Learn. A Multistrategy Approach 4, 318–362 (1994)Google Scholar
  24. 24.
    Zhu, Z., Ong, Y.S., Zurada, J.M.: Identification of full and partial class relevant genes. IEEE/ACM Trans. Comput. Biol. Bioinform. 7, 263–277 (2010)CrossRefGoogle Scholar
  25. 25.
    Bache, K., Lichman, M.: UCI machine learning repository (2013)Google Scholar
  26. 26.
    Zieba, M., Tomczak, J.M., Lubicz, M., Swiatek, J.: Boosted svm for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Appl. Soft Comput. 14, 99–108 (2014)CrossRefGoogle Scholar
  27. 27.
    Lucas, D.D., Klein, R., Tannahill, J., Ivanova, D., Brandon, S., Domyancic, D., Zhang, Y.: Failure analysis of parameter-induced simulation crashes in climate models. Geoscientific Model Devel. 6, 1157–1171 (2013)CrossRefGoogle Scholar
  28. 28.
    Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decis. Support Syst. 47, 547–553 (2009)CrossRefGoogle Scholar
  29. 29.
    Tsanas, A., Little, M., Fox, C., Ramig, L.: Objective automatic assessment of rehabilitative speech treatment in parkinson’s disease. IEEE Trans. Neural Syst. Rehabil. Eng. 22, 181–190 (2014)CrossRefGoogle Scholar
  30. 30.
    Sikora, M., Wróbel, Ł.: Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines. Arch. Min. Sci. 55, 91–114 (2010)Google Scholar
  31. 31.
    Putten, P.V.D., Someren, M.V.: Coil challenge 2000: the insurance company case. Technical report 2000–2009. Leiden Institute of Advanced Computer Science, Universiteit van Leiden (2000)Google Scholar
  32. 32.
    Manikandan, S.: Measures of central tendency: the mean. J. Pharmacol. Pharmacotherapeutics 2, 140 (2011)CrossRefGoogle Scholar
  33. 33.
    Alcala-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., Garcia, S.: Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Multiple-Valued Logic Soft Comput. 17, 255–287 (2011)Google Scholar
  34. 34.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRefGoogle Scholar
  35. 35.
    Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, pp. 545–552 (2004)Google Scholar
  36. 36.
    Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J.J., Sandhu, S., Guppy, K.H., Lee, S., Froelicher, V.: International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 64, 304–310 (1989)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany 2016

Authors and Affiliations

  • Javad Rahimipour Anaraki
    • 1
    Email author
  • Saeed Samet
    • 2
  • Wolfgang Banzhaf
    • 3
  • Mahdi Eftekhari
    • 4
  1. 1.Department of Computer ScienceMemorial University of NewfoundlandSt. John’sCanada
  2. 2.Faculty of MedicineMemorial University of NewfoundlandSt. John’sCanada
  3. 3.Department of Computer ScienceMemorial University of NewfoundlandSt. John’sCanada
  4. 4.Department of Computer EngineeringShahid Bahonar University of KermanKermanIran

Personalised recommendations