Towards Automatic Feature Construction for Supervised Classification

  • Marc Boullé
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8724)


We suggest an approach to automate variable construction for supervised learning, especially in the multi-relational setting. Domain knowledge is specified by describing the structure of data by the means of variables, tables and links across tables, and choosing construction rules. The space of variables that can be constructed is virtually infinite, which raises both combinatorial and over-fitting problems. We introduce a prior distribution over all the constructed variables, as well as an effective algorithm to draw samples of constructed variables from this distribution. Experiments show that the approach is robust and efficient.


supervised learning relational learning feature construction feature selection regularization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bache, K., Lichman, M.: UCI machine learning repository (2013),
  2. 2.
    Blockeel, H., De Raedt, L., Ramon, J.: Top-Down Induction of Clustering Trees. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 55–63 (1998)Google Scholar
  3. 3.
    Boullé, M.: A Bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research 6, 1431–1452 (2005)MATHGoogle Scholar
  4. 4.
    Boullé, M.: MODL: a Bayes optimal discretization method for continuous attributes. Machine Learning 65(1), 131–165 (2006)CrossRefGoogle Scholar
  5. 5.
    Boullé, M.: Compression-based averaging of selective naive Bayes classifiers. Journal of Machine Learning Research 8, 1659–1685 (2007)MATHGoogle Scholar
  6. 6.
    Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0: step-by-step data mining guide. Tech. rep., The CRISP-DM consortium (2000)Google Scholar
  7. 7.
    Cover, T., Thomas, J.: Elements of information theory. Wiley-Interscience, New York (1991)CrossRefMATHGoogle Scholar
  8. 8.
    De Raedt, L.: Attribute-Value Learning Versus Inductive Logic Programming: The Missing Links (Extended Abstract). In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446, pp. 1–8. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  9. 9.
    Džeroski, S., Lavrač, N.: Relational Data Mining. Springer-Verlag New York, Inc. (2001)Google Scholar
  10. 10.
    Džeroski, S., Schulze-Kremer, S., Heidtke, K.R., Siems, K., Wettschereck, D., Blockeel, H.: Diterpene Structure Elucidation From 13C NMR Spectra With Inductive Logic Programming. Applied Artificial Intelligence, Special Issue on First-Order Knowledge Discovery in Databases 12(5), 363–383 (1998)Google Scholar
  11. 11.
    Efron, B., Tibshirani, R.: An introduction to the bootstrap. Monographs on Statistics and Applied Probability, vol. 57. Chapman & Hall/CRC, New York (1993)CrossRefMATHGoogle Scholar
  12. 12.
    Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction: Foundations And Applications. Springer (2006)Google Scholar
  13. 13.
    Knobbe, A.J., Blockeel, H., Siebes, A., Van Der Wallen, D.: Multi-Relational Data Mining. In: Proceedings of Benelearn 1999 (1999)Google Scholar
  14. 14.
    Kramer, S., Flach, P.A., Lavrač, N.: Propositionalization approaches to relational data mining. In: Džeroski, S., Lavrač, N. (eds.) Relational Data Mining, ch. 11, pp. 262–286. Springer (2001)Google Scholar
  15. 15.
    Krogel, M.-A., Wrobel, S.: Transformation-based learning using multirelational aggregation. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, pp. 142–155. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  16. 16.
    Lachiche, N., Flach, P.: Ibc: A first-order bayesian classifier. In: Proceedings of the 9th International Workshop on Inductive Logic Programming, pp. 92–103. Springer (1999)Google Scholar
  17. 17.
    Lachiche, N., Flach, P.A.: 1bc2: A true first-order bayesian classifier. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 133–148. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  18. 18.
    Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer Academic Publishers (1998)Google Scholar
  19. 19.
    Pyle, D.: Data preparation for data mining. Morgan Kaufmann Publishers, Inc., San Francisco (1999)Google Scholar
  20. 20.
    Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)CrossRefMATHGoogle Scholar
  21. 21.
    Rissanen, J.: A universal prior for integers and estimation by minimum description length. Annals of Statistics 11(2), 416–431 (1983)CrossRefMATHMathSciNetGoogle Scholar
  22. 22.
    Shannon, C.: A mathematical theory of communication. Tech. Rep. 27, Bell Systems Technical Journal (1948)Google Scholar
  23. 23.
    Srinivasan, A., Muggleton, S., King, R., Sternberg, M.: Mutagenesis: ILP experiments in a non-determinate biological domain. In: Wrobel, S. (ed.) Proceedings of the 4th International Workshop on Inductive Logic Programmin (ILP 1994). GMD-Studien, vol. 237, pp. 217–232 (1994)Google Scholar
  24. 24.
    Vens, C., Ramon, J., Blockeel, H.: Refining aggregate conditions in relational learning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 383–394. Springer, Heidelberg (2006)Google Scholar
  25. 25.
    Zhou, Z.H., Zhang, M.L.: Multi-instance multi-label learning with application to scene classification. In: Schölkopf, B., Platt, J., Hofmann, T. (eds.) Advances in Neural Information Processing Systems (NIPS 2006), vol. i, pp. 1609–1616. MIT Press, Cambridge (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Marc Boullé
    • 1
  1. 1.Orange LabsLannionFrance

Personalised recommendations