Modelling Complex Data by Learning Which Variable to Construct

Fessant, Françoise; Le Cam, Aurélie; Boullé, Marc; Féraud, Raphaël

doi:10.1007/978-3-642-15105-7_26

Françoise Fessant¹⁹,
Aurélie Le Cam¹⁹,
Marc Boullé¹⁹ &
…
Raphaël Féraud¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6263))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

975 Accesses

Abstract

This paper addresses a task of variable selection which consists in choosing a subset of variables that is sufficient to predict the target label well. Here instead of trying to directly determine which variables are better, we make use of prior knowledge to learn the properties of good variables and guide the selection towards the most relevant dimensions. For this purpose we assume that a variable can be represented by a set of indicators that describe both the properties of the variable and its potential relationship to the targeting problem. This approach enables the prediction of the relevance of variables without measuring their value on the training instances. We devise a selection methodology that can efficiently search for new good variables in the presence of a huge number of variables and to dramatically reduce the number of variable measurements needed. Our algorithm is illustrated on an industrial CRM application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Guyon, I., Lemaire, V., Boullé, M., Dror, G., Vogel, D.: Analysis of the kdd cup 2009: Fast scoring on a large orange customer database. Journal of Machine Learning Research: Workshop and Conference Proceedings 7, 1–22 (2010)
Google Scholar
Féraud, F., Boullé, M., Clérot, F., Fessant, F., Lemaire, V.: The orange customer analysis platform. In: Perner, P., Ahlemeyer-Stubbe, A. (eds.) Proceedings of the 10th Industrial Conference on Data Mining. Springer, Heidelberg (2010)
Google Scholar
Boullé, M.: MODL: a Bayes optimal discretization method for continuous attributes. Machine Learning 65(1), 131–165 (2006)
Article Google Scholar
Boullé, M.: A Bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research 6, 1431–1452 (2005)
MathSciNet MATH Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Kohavi, R., John, G.: Wrappers for feature selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Article MATH Google Scholar
Féraud, R., Clérot, F.: A methodology to explain neural network classification. Neural Networks 15, 237–246 (2001)
Article Google Scholar
Krupka, E., Navot, A., Tishby, N.: Learning to select features using their properties. Journal of Machine Learning Research 9, 2349–2376 (2008)
MATH Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)
Article MATH Google Scholar
Lee, S., Chatalbashev, V., Vickrey, D., Koller, D.: Learning a meta-level prior for feature relevance from multiple related tasks, pp. 489–496 (2007)
Google Scholar
Helleputte, T., Dupont, P.: Partially supervised feature selection with regularized linear models. In: Bottou, L., Littman, M. (eds.) Proceedings of the 26th International Conference on Machine Learning, Montreal, Omnipress, pp. 409–416 (June 2009)
Google Scholar
Fawcett, T.: ROC graphs: Notes and practical considerations for researchers. Technical Report HPL-2003-4, HP Laboratories (2003)
Google Scholar
Gaudel, R., Sebag, M.: Feature selection as a one-player game. In: Proceedings of the second NIPS Workshop on Optimization for Machine Learning, OPT 2009 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Orange Labs, 2 avenue Pierre Marzin, 22307, Lannion, France
Françoise Fessant, Aurélie Le Cam, Marc Boullé & Raphaël Féraud

Authors

Françoise Fessant
View author publications
You can also search for this author in PubMed Google Scholar
Aurélie Le Cam
View author publications
You can also search for this author in PubMed Google Scholar
Marc Boullé
View author publications
You can also search for this author in PubMed Google Scholar
Raphaël Féraud
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Aalborg University, Selma Lagerløfs Vej 300, 9220, Aalborg, Denmark
Torben Bach Pedersen
IBM India Research Lab, 4, Block C, Institutional Area, Vasant Kunj, 110 070, New Delhi, India
Mukesh K. Mohania
Institute of Software Technology, Vienna University of Technology, Favoritenstr. 9-11/188, 1040, Vienna, Austria
A Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fessant, F., Le Cam, A., Boullé, M., Féraud, R. (2010). Modelling Complex Data by Learning Which Variable to Construct. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2010. Lecture Notes in Computer Science, vol 6263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15105-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-15105-7_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15104-0
Online ISBN: 978-3-642-15105-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics