Abstract
This paper addresses a task of variable selection which consists in choosing a subset of variables that is sufficient to predict the target label well. Here instead of trying to directly determine which variables are better, we make use of prior knowledge to learn the properties of good variables and guide the selection towards the most relevant dimensions. For this purpose we assume that a variable can be represented by a set of indicators that describe both the properties of the variable and its potential relationship to the targeting problem. This approach enables the prediction of the relevance of variables without measuring their value on the training instances. We devise a selection methodology that can efficiently search for new good variables in the presence of a huge number of variables and to dramatically reduce the number of variable measurements needed. Our algorithm is illustrated on an industrial CRM application.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Guyon, I., Lemaire, V., Boullé, M., Dror, G., Vogel, D.: Analysis of the kdd cup 2009: Fast scoring on a large orange customer database. Journal of Machine Learning Research: Workshop and Conference Proceedings 7, 1–22 (2010)
Féraud, F., Boullé, M., Clérot, F., Fessant, F., Lemaire, V.: The orange customer analysis platform. In: Perner, P., Ahlemeyer-Stubbe, A. (eds.) Proceedings of the 10th Industrial Conference on Data Mining. Springer, Heidelberg (2010)
Boullé, M.: MODL: a Bayes optimal discretization method for continuous attributes. Machine Learning 65(1), 131–165 (2006)
Boullé, M.: A Bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research 6, 1431–1452 (2005)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Kohavi, R., John, G.: Wrappers for feature selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Féraud, R., Clérot, F.: A methodology to explain neural network classification. Neural Networks 15, 237–246 (2001)
Krupka, E., Navot, A., Tishby, N.: Learning to select features using their properties. Journal of Machine Learning Research 9, 2349–2376 (2008)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)
Lee, S., Chatalbashev, V., Vickrey, D., Koller, D.: Learning a meta-level prior for feature relevance from multiple related tasks, pp. 489–496 (2007)
Helleputte, T., Dupont, P.: Partially supervised feature selection with regularized linear models. In: Bottou, L., Littman, M. (eds.) Proceedings of the 26th International Conference on Machine Learning, Montreal, Omnipress, pp. 409–416 (June 2009)
Fawcett, T.: ROC graphs: Notes and practical considerations for researchers. Technical Report HPL-2003-4, HP Laboratories (2003)
Gaudel, R., Sebag, M.: Feature selection as a one-player game. In: Proceedings of the second NIPS Workshop on Optimization for Machine Learning, OPT 2009 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fessant, F., Le Cam, A., Boullé, M., Féraud, R. (2010). Modelling Complex Data by Learning Which Variable to Construct. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2010. Lecture Notes in Computer Science, vol 6263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15105-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-15105-7_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15104-0
Online ISBN: 978-3-642-15105-7
eBook Packages: Computer ScienceComputer Science (R0)