All Relevant Feature Selection Methods and Applications

Rudnicki, Witold R.; Wrzesień, Mariusz; Paja, Wiesław

doi:10.1007/978-3-662-45620-0_2

Witold R. Rudnicki⁴,
Mariusz Wrzesień⁵ &
Wiesław Paja⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 584))

3756 Accesses
25 Citations

Abstract

All-relevant feature selection is a relatively new sub-field in the domain of feature selection. The chapter is devoted to a short review of the field and presentation of the representative algorithm. The problem of all-relevant feature selection is first defined, then key algorithms are described. Finally the Boruta algorithm, under development at ICM, University of Warsaw, is explained in a greater detail and applied both to a collection of synthetic and real-world data sets. It is shown that algorithm is both sensitive and selective. The level of falsely discovered relevant variables is low—on average less than one falsely relevant variable is discovered for each set. The sensitivity of the algorithm is nearly 100 % for data sets for which classification is easy, but may be smaller for data sets for which classification is difficult, nevertheless, it is possible to increase the sensitivity of the algorithm at the cost of increased computational effort without adversely affecting the false discovery level. It is achieved by increasing the number of trees in the random forest algorithm that delivers the importance estimate in Boruta.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2013)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Draminski, M., Kierczak, M., Koronacki, J., Komorowski, J.: Monte Carlo feature selection and interdependency discovery in supervised classification. In: Koronacki, J. (ed.) Advances in Machine Learning II. SCI, vol. 263, pp. 371–385. Springer (2010)
Google Scholar
Draminski, M., Rada-Iglesias, A., Enroth, S., Wadelius, C., Koronacki, J., Komorowski, J.: Monte Carlo feature selection for supervised classification. Bioinformatics 24(1), 110–117 (2008)
Article Google Scholar
Gunduz, N., Fokoue, E.: UCI machine learning repository. http://archive.ics.uci.edu/ml/datasets/Turkiye+Student+Evaluation (2013)
Huynh-Thu, V.A., Wehenkel, L., Geurts, P.: Exploiting tree-based variable importances to selectively identify relevant variables. In: JMLR: Workshop and Conference Proceedings, vol. 4, pp. 60–73 (2008)
Google Scholar
Kohavi, R., John, G.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)
Article MATH Google Scholar
Kursa, M.B., Rudnicki, W.R.: Feature selection with the Boruta package. J. Stat. Softw. 36(11), 1–13 (2010)
Google Scholar
Kursa, M.B., Jankowski, A., Rudnicki, W.R.: Boruta—a system for feature selection. Fundam. Inform. 101(4), 271–285 (2010)
MathSciNet Google Scholar
Leisch, F., Dimitriadou, E.: mlbench: machine learning benchmark problems. R package version 2.1–1 (2010)
Google Scholar
Liaw, A., Wiener, M.: Classification and regression by random forest. R News 2(3), 18–22 (2002). http://CRAN.R-project.org/doc/Rnews/
Mansouri, K., Ringsted, T., Ballabio, D., Todeschini, R., Consonni, V.: Quantitative structure-activity relationship models for ready biodegradability of chemicals. J. Chem. Inf. Model. 53(4), 867–878 (2013)
Article Google Scholar
Nilsson, R., Peña, J.M., Björkegren, J., Tegnér, J.: Detecting multivariate differentially expressed genes. BMC Bioinform. 8, 150 (2007)
Google Scholar
Rudnicki, W.R., Kierczak, M., Koronacki, J., Komorowski, J.: A statistical method for determining importance of variables in an information system. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H., Slowinski, R. (eds.) Rough Sets and Current Trends in Computing, vol. 4259/2006, pp. 557–566. Springer, Berlin/Heidelberg (2006)
Chapter Google Scholar
Stoppiglia, H., Dreyfus, G., Dubois, R., Oussar, Y.: Ranking a random feature for variable and feature selection. J. Mach. Learn. Res. 3(7–8), 1399–1414 (2003)
MATH Google Scholar
Team, R.C.: R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria (2012). http://www.R-project.org/
Tuv, E., Borisov, A., Torkkola, K.: Feature selection using ensemble based ranking against artificial contrasts. In: The 2006 IEEE International Joint Conference on Neural Network Proceedings, pp. 2181–2186. IEEE (2006)
Google Scholar

Download references

Acknowledgments

Computations were partially performed at the Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Poland, grant G34-5. Authors would like to thank Mr. Rafał Niemiec for technical help.

Author information

Authors and Affiliations

Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Pawińskiego 5A, 02-106, Warsaw, Poland
Witold R. Rudnicki
Faculty of Applied IT, University of Information Technology and Management, Sucharskiego 2, 35-225, Rzeszów, Poland
Mariusz Wrzesień & Wiesław Paja

Authors

Witold R. Rudnicki
View author publications
You can also search for this author in PubMed Google Scholar
Mariusz Wrzesień
View author publications
You can also search for this author in PubMed Google Scholar
Wiesław Paja
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Witold R. Rudnicki .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Urszula Stańczyk
Mawson Lakes Campus, Faculty of Education, Science, Technology and Mathematics, University of Canberra, Canberra, Australia, and University of South Australia, Adelaide, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rudnicki, W.R., Wrzesień, M., Paja, W. (2015). All Relevant Feature Selection Methods and Applications. In: Stańczyk, U., Jain, L. (eds) Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence, vol 584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45620-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-662-45620-0_2
Published: 31 December 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45619-4
Online ISBN: 978-3-662-45620-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics