Skip to main content

Modelling Complex Data by Learning Which Variable to Construct

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6263))

Included in the following conference series:

  • 975 Accesses

Abstract

This paper addresses a task of variable selection which consists in choosing a subset of variables that is sufficient to predict the target label well. Here instead of trying to directly determine which variables are better, we make use of prior knowledge to learn the properties of good variables and guide the selection towards the most relevant dimensions. For this purpose we assume that a variable can be represented by a set of indicators that describe both the properties of the variable and its potential relationship to the targeting problem. This approach enables the prediction of the relevance of variables without measuring their value on the training instances. We devise a selection methodology that can efficiently search for new good variables in the presence of a huge number of variables and to dramatically reduce the number of variable measurements needed. Our algorithm is illustrated on an industrial CRM application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Guyon, I., Lemaire, V., Boullé, M., Dror, G., Vogel, D.: Analysis of the kdd cup 2009: Fast scoring on a large orange customer database. Journal of Machine Learning Research: Workshop and Conference Proceedings 7, 1–22 (2010)

    Google Scholar 

  2. Féraud, F., Boullé, M., Clérot, F., Fessant, F., Lemaire, V.: The orange customer analysis platform. In: Perner, P., Ahlemeyer-Stubbe, A. (eds.) Proceedings of the 10th Industrial Conference on Data Mining. Springer, Heidelberg (2010)

    Google Scholar 

  3. Boullé, M.: MODL: a Bayes optimal discretization method for continuous attributes. Machine Learning 65(1), 131–165 (2006)

    Article  Google Scholar 

  4. Boullé, M.: A Bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research 6, 1431–1452 (2005)

    MathSciNet  MATH  Google Scholar 

  5. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    MATH  Google Scholar 

  6. Kohavi, R., John, G.: Wrappers for feature selection. Artificial Intelligence 97(1-2), 273–324 (1997)

    Article  MATH  Google Scholar 

  7. Féraud, R., Clérot, F.: A methodology to explain neural network classification. Neural Networks 15, 237–246 (2001)

    Article  Google Scholar 

  8. Krupka, E., Navot, A., Tishby, N.: Learning to select features using their properties. Journal of Machine Learning Research 9, 2349–2376 (2008)

    MATH  Google Scholar 

  9. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)

    Article  MATH  Google Scholar 

  10. Lee, S., Chatalbashev, V., Vickrey, D., Koller, D.: Learning a meta-level prior for feature relevance from multiple related tasks, pp. 489–496 (2007)

    Google Scholar 

  11. Helleputte, T., Dupont, P.: Partially supervised feature selection with regularized linear models. In: Bottou, L., Littman, M. (eds.) Proceedings of the 26th International Conference on Machine Learning, Montreal, Omnipress, pp. 409–416 (June 2009)

    Google Scholar 

  12. Fawcett, T.: ROC graphs: Notes and practical considerations for researchers. Technical Report HPL-2003-4, HP Laboratories (2003)

    Google Scholar 

  13. Gaudel, R., Sebag, M.: Feature selection as a one-player game. In: Proceedings of the second NIPS Workshop on Optimization for Machine Learning, OPT 2009 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fessant, F., Le Cam, A., Boullé, M., Féraud, R. (2010). Modelling Complex Data by Learning Which Variable to Construct. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2010. Lecture Notes in Computer Science, vol 6263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15105-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15105-7_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15104-0

  • Online ISBN: 978-3-642-15105-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics