Feature Selection Methods Based on Decision Rule and Tree Models
Feature selection methods, as a preprocessing step to machine learning, is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this work, a novel concepts of relevant feature selection based on information gathered from decision rule and decision tree models were introduced. A new measures DRQualityImp and DTLevelImp were additionally defined. The first one is based on feature presence frequency and rule quality, while the second is based on feature presence on different levels inside decision tree. The efficiency and effectiveness of that method is demonstrated through the exemplary use of five real-world datasets. Promising initial results of classification efficiency could be gained together with substantial reduction of problem dimensionality.
KeywordsFeature selection Feature ranking Decision rules Dimensionality reduction Relevance and irrelevance
This work was supported by the Center for Innovation and Transfer of Natural Sciences and Engineering Knowledge at the University of Rzesz̀w.
- 2.Bermingham, M.L., Pong-Wong, R., Spiliopoulou, A., Hayward, C., Rudan, I., Campbell, H., Wright, A.F., Wilson, J.F., Agakov, F., Navarro, P., Haley, C.S.: Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci. Rep. 5, (2015)Google Scholar
- 3.Phuong, T.M., Lin, Z., Altman, R.B.: Choosing SNPs using feature selection. In: Proceedings of 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005, pp. 301–309 (2005)Google Scholar
- 7.Rudnicki, W.R., Wrzesień, M., Paja, W.: All Relevant feature selection methods and applications. In: Stańczyk, U., Lakhmi, C.J. (eds.) Feature Selection for Data and Pattern Recognition, pp. 11–28. Springer-Verlag, Berlin Heidelberg, Berlin (2015)Google Scholar
- 8.Greco, S., Słowinski, R., Stefanowski, J.: Evaluating importance of conditions in the set of discovered rules. In: RSFDGrC’07: Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Toronto, Ontario, Canada, pp. 314–321 (2007)Google Scholar
- 11.Tuv, E., Borisov, A., Torkkola, K.: Feature selection using ensemble based ranking against artificial contrasts. In: International Symposium on Neural Networks, pp. 2181–2186 (2006)Google Scholar
- 13.Hippe, Z.S., Bajcar, S., Blajdo, P., Grzymala-Busse, J.P., Grzymala-Busse, J.W., Knap, M., Paja, W., Wrzesien, M.: Diagnosing skin melanoma: current versus future directions. TASK Q. 7, 289–293 (2003)Google Scholar
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.