A Novel Outlook on Feature Selection as a Multi-objective Problem

Barbiero, Pietro; Lutton, Evelyne; Squillero, Giovanni; Tonda, Alberto

doi:10.1007/978-3-030-45715-0_6

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12052))

Included in the following conference series:

International Conference on Artificial Evolution (Evolution Artificielle)

507 Accesses
3 Citations

Abstract

Feature selection is the process of choosing, or removing, features to obtain the most informative feature subset of minimal size. Such subsets are used to improve performance of machine learning algorithms and enable human understanding of the results. Approaches to feature selection in literature exploit several optimization algorithms. Multi-objective methods also have been proposed, minimizing at the same time the number of features and the error. While most approaches assess error resorting to the average of a stochastic K-fold cross-validation, comparing averages might be misleading. In this paper, we show how feature subsets with different average error might in fact be non-separable when compared using a statistical test. Following this idea, clusters of non-separable optimal feature subsets are identified. The performance in feature selection can thus be evaluated by verifying how many of these optimal feature subsets an algorithm is able to identify. We thus propose a multi-objective optimization approach to feature selection, EvoFS, with the objectives to i. minimize feature subset size, ii. minimize test error on a 10-fold cross-validation using a specific classifier, iii. maximize the analysis of variance value of the lowest-performing feature in the set. Experiments on classification datasets whose feature subsets can be exhaustively evaluated show that our approach is able to always find the best feature subsets. Further experiments on a high-dimensional classification dataset, that cannot be exhaustively analyzed, show that our approach is able to find more optimal feature subsets than state-of-the-art feature selection algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://bitbucket.org/evomlteam/moea-feature-selection.
2.
Intel^® Core™ i7-8750H 2.20 GHz, 8 GB RAM.

References

Cilia, N.D., De Stefano, C., Fontanella, F., Scotto di Freca, A.: Variable-length representation for EC-based feature selection in high-dimensional data. In: Kaufmann, P., Castillo, P.A. (eds.) EvoApplications 2019. LNCS, vol. 11454, pp. 325–340. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16692-2_22
Chapter Google Scholar
Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2015)
Article Google Scholar
Hamdani, T.M., Won, J.-M., Alimi, A.M., Karray, F.: Multi-objective feature selection with NSGA II. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds.) ICANNGA 2007. LNCS, vol. 4431, pp. 240–247. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71618-1_27
Chapter Google Scholar
Xue, B., Fu, W., Zhang, M.: Multi-objective feature selection in classification: a differential evolution approach. In: Dick, G.G., et al. (eds.) SEAL 2014. LNCS, vol. 8886, pp. 516–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13563-2_44
Chapter Google Scholar
Vignolo, L.D., Milone, D.H., Scharcanski, J.: Feature selection for face recognition based on multi-objective evolutionary wrappers. Expert Syst. Appl. 40(13), 5077–5084 (2013)
Article Google Scholar
Zhou, Z., Li, S., Qin, G., Folkert, M., Jiang, S., Wang, J.: Multi-objective based radiomic feature selection for lesion malignancy classification. IEEE J. Biomed. Health Inform. 24, 194–204 (2019)
Article Google Scholar
Fan, Y.J., Kamath, C.: On the selection of dimension reduction techniques for scientific applications (2012). 10.2172/1036865. part of the Annals of Information Systems book series (AOIS, volume 17)
Bermingham, M., et al.: Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci. Rep. 5, 10312 (2015). https://doi.org/10.1038/srep10312
Article Google Scholar
Tsai, F.S.: Dimensionality reduction for computer facial animation. Expert Syst. Appl. 39(5), 4965–4971 (2012). https://doi.org/10.1016/j.eswa.2011.10.018
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Lewis, P.: The characteristic selection problem in recognition systems. IRE Trans. Inf. Theory 8(2), 171–178 (1962)
Article Google Scholar
Chien, Y., Fu, K.S.: On the generalized Karhunen-Loève expansion (Corresp.). IEEE Trans. Inf. Theory 13(3), 518–520 (1967)
Article Google Scholar
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature selection for SVMs. In: Advances in Neural Information Processing Systems 13, pp. 668–674. MIT Press (2000)
Google Scholar
Kozachenko, L., Leonenko, N.N.: Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii 23(2), 9–16 (1987)
MathSciNet MATH Google Scholar
Fisher, R.A.: XV-the correlation between relatives on the supposition of mendelian inheritance. Earth Environ. Sci. Trans. R. Soc. Edinb. 52(2), 399–433 (1919)
Article Google Scholar
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Article Google Scholar
Heiman, G.W.: Understanding Research Methods and Statistics: An Integrated Introduction for Psychology. Mifflin and Company, Houghton (2001)
Google Scholar
Cox, D.R.: The regression analysis of binary sequences. J. Roy. Stat. Soc. Ser. B (Methodol.) 20(2), 215–232 (1958)
MathSciNet MATH Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Article Google Scholar
Welch, B.L.: The generalization of student’s problem when several different population variances are involved. Biometrika 34(1/2), 28–35 (1947)
Article MathSciNet Google Scholar
Krzywinski, M., Altman, N.: Points of significance: comparing samples-part I. Nat. Methods 11(3), 215 (2014)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Casalicchio, G., et al.: OpenML: an R package to connect to the machine learning platform OpenML. Comput. Statistics 34(3), 977–991 (2017). https://doi.org/10.1007/s00180-017-0742-2
Article MathSciNet MATH Google Scholar
Garrett, A.: inspyred (version 1.0.1) inspired intelligence (2012). https://github.com/aarongarrett/inspyred
Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013). https://doi.org/10.1145/2641190.2641198
Article Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Quinlan, J.R.: Simplifying decision trees. Int. J. Man Mach. Stud. 27(3), 221–234 (1987)
Article Google Scholar
Siebert, J.P.: Vehicle recognition using rule based methods (1987)
Google Scholar
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the NIPS 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, pp. 545–552 (2005)
Google Scholar
Guyon, I.: Design of experiments of the NIPS 2003 variable selection benchmark. In: NIPS 2003 Workshop on Feature Extraction and Feature Selection (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Politecnico di Torino, Torino, Italy
Pietro Barbiero & Giovanni Squillero
UMR 782, Université Paris-Saclay, INRA, AgroParisTech, Thiverval-Grignon, France
Evelyne Lutton & Alberto Tonda

Authors

Pietro Barbiero
View author publications
You can also search for this author in PubMed Google Scholar
Evelyne Lutton
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Squillero
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Tonda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alberto Tonda .

Editor information

Editors and Affiliations

IRIMAS Institute, ENSISA, Mulhouse, France
Lhassane Idoumghar
Inria Bordeaux Sud-Ouest, IMB, University of Bordeaux, Talence, France
Pierrick Legrand
Research Center in Computer Science, Signal and Automatic Control of Lille, University of Lille, Villeneuve d'Ascq, France
Arnaud Liefooghe
GMPA, INRA, Thiverval-Grignon, France
Evelyne Lutton
Laboratoire d'Informatique, University of Tours, Tours, France
Nicolas Monmarché
Inria Saclay, University of Paris-Sud, Orsay, France
Marc Schoenauer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barbiero, P., Lutton, E., Squillero, G., Tonda, A. (2020). A Novel Outlook on Feature Selection as a Multi-objective Problem. In: Idoumghar, L., Legrand, P., Liefooghe, A., Lutton, E., Monmarché, N., Schoenauer, M. (eds) Artificial Evolution. EA 2019. Lecture Notes in Computer Science(), vol 12052. Springer, Cham. https://doi.org/10.1007/978-3-030-45715-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-45715-0_6
Published: 29 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45714-3
Online ISBN: 978-3-030-45715-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics