Abstract
This paper presents a supervised feature selection method applied to regression problems. The selection method uses a Dissimilarity matrix originally developed for classification problems, whose applicability is extended here to regression and built using the conditional mutual information between features with respect to a continuous relevant variable that represents the regression function. Applying an agglomerative hierarchical clustering technique, the algorithm selects a subset of the original set of features. The proposed technique is compared with other three methods. Experiments on four data-sets of different nature are presented to show the importance of the features selected from the point of view of the regression estimation error (using Support Vector Regression) considering the Root Mean Squared Error (RMSE).
This work was supported by the Spanish Ministry of Science and Innovation under the projects Consolider Ingenio 2010 CSD2007 − 00018, and EODIX AYA2008 − 05965 − C04 − 04/ESP and by Fundació Caixa-Castelló through the project P1 1B2007 − 48.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons Inc., Chichester (1991)
Drucker, H., Burges, C., Kaufman, L., Kaufman, L., Smola, A., Vapnik, V.: Support Vector Regression Machines. In: NIPS 1996, pp. 155–161 (1996)
Friedmann, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences 180, 2044–2064 (2010)
Gill, P.E., Murray, W., Wright, M.H.: Practical Optimization. Academic Press, London (1981)
Harrison, D., Rubinfeld, D.L.: Hedonic prices and the demand for clean air. J. Environ. Econ. Manag. 5, 81–102 (1978)
Holmes, M.P., Gray, A., Isbell, C.L.: Fast kernel conditional density estimation: A dual-tree Monte Carlo approach. Comput. Stat. Data Analysis 54(7), 1707–1718 (2010)
Hyndman, R.J., Bashtannyk, D.M., Grunwald, G.K.: Estimating and visualizing conditional densities. Journal of Computation and graphical Statistics 5(4), 315–336 (1996)
Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: A review. IEEE Trans. PAMI 22(1), 4–37 (2000)
Kennedy, J., Eberhart, R.: Particle Swarm Optimization. In: IEEE ICNN, pp. 1942–1948 (1995)
Kwak, N., Choi, C.-H.: Input feature selection by mutual information based on parzen window. IEEE Trans. PAMI 24(12), 1667–1671 (2002)
Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1, 131–156 (1997)
Monteiro, S.T., Kosugi, Y.: Particle Swarms for Feature Extraction of Hyperspectral Data. IEICE Trans. Inf. and Syst. E90D(7), 1038–1046 (2007)
Moreno, J.F.: Sen2flex data acquisition report, Universidad de Valencia, Tech. Rep. (2005)
Ney, H.: On the relationship between classification error bounds and training criteria in statistical pattern recognition. In: Perales, F.J., Campilho, A.C., Pérez, N., Sanfeliu, A. (eds.) IbPRIA 2003. LNCS, vol. 2652, pp. 636–645. Springer, Heidelberg (2003)
Pudil, P., Ferri, F.J., Novovicova, J., Kittler, J.: Floating search methods for feature selection with nonmonotonic criterion functions. Pattern Recognition 2, 279–283 (1994)
Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 27, 832–837 (1956)
Sotoca, J.M., Pla, F.: Supervised feature selection by clustering using conditional mutual information-based distances. Pattern Recognition 43(6), 2068–2081 (2010)
Verleysen, M., Rossi, F., François, D.: Advances in Feature Selection with Mutual Information. In: Biehl, M., Hammer, B., Verleysen, M., Villmann, T. (eds.) Similarity-Based Clustering. LNCS, vol. 5400, pp. 52–69. Springer, Heidelberg (2009)
Ward, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Statist. Soc. B 67(part 2), 301–320 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Latorre Carmona, P., Sotoca, J.M., Pla, F., Phoa, F.K.H., Bioucas Dias, J. (2011). Feature Selection in Regression Tasks Using Conditional Mutual Information. In: Vitrià, J., Sanches, J.M., Hernández, M. (eds) Pattern Recognition and Image Analysis. IbPRIA 2011. Lecture Notes in Computer Science, vol 6669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21257-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-21257-4_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21256-7
Online ISBN: 978-3-642-21257-4
eBook Packages: Computer ScienceComputer Science (R0)