Abstract
High-dimensional data are ubiquitous in regression. To obtain a better understanding of the data or to ease the learning process, reducing the data to a subset of the most relevant features is important. Among the different methods of feature selection, filter methods are popular because they are independent from the model, which makes them fast and computationally simpler than other feature selection methods. The key factor of a filter method is the filter criterion. This paper focuses on which properties make a good filter criterion, in order to be able to select one from the numerous existing ones. Six properties are discussed, and three filter criteria are compared with respect to the aforementioned properties.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5, 537–550 (1994). https://doi.org/10.1109/72.298224
Brown, G., Pocock, A., Zhao, M., Lujan, M.: Conditional likelihood maximisation: a unifying framework for mutual information feature selection. J. Mach. Learn. Res. 13, 27–66 (2012)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014). https://doi.org/10.1016/j.compeleceng.2013.11.024
Degeest, A., Verleysen, M., Frénay, B.: Smoothness bias in relevance estimators for feature selection in regression. In: Proceedings of AIAI, pp. 285–294 (2018). https://doi.org/10.1007/978-3-319-92007-825
Degeest, A., Verleysen, M., Frénay, B.: About filter criteria for feature selection in regression. In: Rojas, I., Joya, G., Catala, A. (eds.) IWANN 2019. LNCS, vol. 11507, pp. 579–590. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20518-8_48
Doquire, G., Verleysen, M.: A comparison of multivariate mutual information estimators for feature selection. In: Proceedings of ICPRAM (2012). https://doi.org/10.5220/0003726101760185
Eirola, E., Lendasse, A., Corona, F., Verleysen, M.: The delta test: The 1-NN estimator as a feature selection criterion. In: Proceedings of IJCNN, pp. 4214–4222 (2014). https://doi.org/10.1109/IJCNN.2014.6889560
Eirola, E., Liitiäinen, E., Lendasse, A., Corona, F., Verleysen, M.: Using the delta test for variable selection. In: Proceedings of ESANN (2008)
François, D., Rossi, F., Wertz, V., Verleysen, M.: Resampling methods for parameter-free and robust feature selection with mutual information. Neurocomputing 70(7–9), 1276–1288 (2007). https://doi.org/10.1016/j.neucom.2006.11.019
Frénay, B., Doquire, G., Verleysen, M.: Is mutual information adequate for feature selection in regression? Neural Netw. 48, 1–7 (2013). https://doi.org/10.1016/j.neunet.2013.07.003
Frénay, B., Doquire, G., Verleysen, M.: Theoretical and empirical study on the potential inadequacy of mutual information for feature selection in classification. Neurocomputing 112, 64–78 (2013). https://doi.org/10.1016/j.neucom.2012.12.051
Gao, W., Kannan, S., Oh, S., Viswanath, P.: Estimating mutual information for discrete-continuous mixtures. In: Advances in Neural Information Processing Systems, vol. 30, pp. 5986–5997. Curran Associates, Inc. (2017). http://arxiv.org/abs/1709.06212
Gómez-Verdejo, V., Verleysen, M., Fleury, J.: Information-theoretic feature selection for functional data classification. Neurocomputing 72, 3580–3589 (2009). https://doi.org/10.1016/j.neucom.2008.12.035
Guillén, A., Sovilj, D., Mateo, F., Rojas, I., Lendasse, A.: New methodologies based on delta test for variable selection in regression problems. In: Workshop on Parallel Architectures and Bioinspired Algorithms, Canada (2008)
Helleputte, T., Dupont, P.: Feature selection by transfer learning with linear regularized models. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS (LNAI), vol. 5781, pp. 533–547. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04180-8_52
Karegowda, A.G., Jayaram, M.A., Manjunath, A.S.: Feature subset selection problem using wrapper approach in supervised learning. Int. J. Comput. Appl. 1(7), 13–17 (2010). https://doi.org/10.5120/169-295
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997). https://doi.org/10.1016/S0004-3702(97)00043-X
Kozachenko, L.F., Leonenko, N.: Sample estimate of the entropy of a random vector. Prob. Inform. Trans. 23, 95–101 (1987)
Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Phys. Rev. E 69, 066138 (2004). https://doi.org/10.1103/PhysRevE.69.066138
Lal, T.N., Chapelle, O., Weston, J., Elisseeff, A.: Embedded methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. Studies in Fuzziness and Soft Computing, vol. 207, pp. 137–165. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-86
Liu, A., Jun, G., Ghosh, J.: A self-training approach to cost sensitive uncertainty sampling. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS (LNAI), vol. 5781, pp. 10–10. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04180-8_10
Paul, J., D’Ambrosio, R., Dupont, P.: Kernel methods for heterogeneous feature selection. Neurocomputing 169, 187–195 (2015). https://doi.org/10.1016/j.neucom.2014.12.098
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005). https://doi.org/10.1109/TPAMI.2005.159
Schaffernicht, E., Kaltenhaeuser, R., Verma, S.S., Gross, H.-M.: On estimating mutual information for feature selection. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds.) ICANN 2010. LNCS, vol. 6352, pp. 362–367. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15819-3_48
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948). https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013). https://doi.org/10.1109/TKDE.2011.181
Vergara, J.R., Estévez, P.A.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24, 175–186 (2014). https://doi.org/10.1007/s00521-013-1368-0
Xue, B., Zhang, M., Browne, W., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20 (2015). https://doi.org/10.1109/TEVC.2015.2504420
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Degeest, A., Verleysen, M., Frénay, B. (2019). Comparison Between Filter Criteria for Feature Selection in Regression. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-30484-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30483-6
Online ISBN: 978-3-030-30484-3
eBook Packages: Computer ScienceComputer Science (R0)