Abstract
Given any modeling problem, variable selection is a preprocess step that selects the most relevant variables with respect to the output variable. Forward selection is the most straightforward strategy for variable selection; its application using the mutual information is simple, intuitive and effective, and is commonly used in the machine learning literature. However the problem of when to stop the forward process doesn’t have a direct satisfactory solution due to the inaccuracies of the Mutual Information estimation, specially as the number of variables considered increases. This work proposes a modified stopping criterion for this variable selection methodology that uses the Markov blanket concept. As it will be shown, this approach can increase the performance and applicability of the stopping criterion of a forward selection process using mutual information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
François, D., Rossi, F., Wertz, V., Verleysen, M.: Resampling methods for parameter-free and robust feature selection with mutual information. Neurocomputing 70, 1276–1288 (2007)
Rossi, F., Lendasse, A., François, D., Wertz, V., Verleysen, M.: Mutual information for the selection of relevant variables in spectrometric nonlinear modelling. Chem. and Int. Lab. Syst. 80, 215–226 (2006)
Kraskov, A., Stogbauer, H., Grassberger, P.: Estimating mutual information. Phys.Rev. E 69, 66138 (2004)
Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)
Koller, D., Sahami, M.: Toward optimal feature selection. In: Proc. Int. Conf. on Machine Learning, pp. 284–292 (1996)
Herrera, L., Pomares, H., Rojas, I., Verleysen, M., Guillén, A.: Effective input variable selection for function approximation. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 41–50. Springer, Heidelberg (2006)
Suykens, J., Gestel, T.V., Brabanter, J.D., Moor, J.D., Vandewalle, B.: Least Squares Support Vector Machines. World Scientific, Singapore (2002)
Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proceedings of the 15th International Conference on Machine Learning, pp. 515–521. Morgan Kaufmann, San Francisco (1998)
An, S., Liu, W., Venkatesh, S.: Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recogn. 40(8), 2154–2162 (2007)
Guillen, A., Rojas, I., Rubio, G., Pomares, H., Herrera, L., Gonzalez, J.: A new interface for mpi in matlab and its application over a genetic algorithm. In: ESTSP 2008: Proceedings of the European Symposium on Time Series Prediction, pp. 37–46 (2008)
Hyndman, R.: Time series data library (1994), http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/hydrology.html
Herrera, L., Pomares, H., Rojas, I., Guillén, A., Prieto, A., Valenzuela, O.: Recursive prediction for long term time series forecasting using advanced models. Neurocomputing 70, 2870–2880 (2007)
Astakhov, S., Grassberger, P., Kraskov, A., Stögbauer, H.: Mutual information least dependent component analysis (2004), http://www.klab.caltech.edu/~kraskov/MILCA/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Herrera, L.J., Rubio, G., Pomares, H., Paechter, B., Guillén, A., Rojas, I. (2009). Strengthening the Forward Variable Selection Stopping Criterion. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds) Artificial Neural Networks – ICANN 2009. ICANN 2009. Lecture Notes in Computer Science, vol 5769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04277-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-04277-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04276-8
Online ISBN: 978-3-642-04277-5
eBook Packages: Computer ScienceComputer Science (R0)