Sequential Cost-Sensitive Feature Acquisition

  • Gabriella ContardoEmail author
  • Ludovic Denoyer
  • Thierry Artières
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9897)


We propose a reinforcement learning based approach to tackle the cost-sensitive learning problem where each input feature has a specific cost. The acquisition process is handled through a stochastic policy which allows features to be acquired in an adaptive way. The general architecture of our approach relies on representation learning to enable performing prediction on any partially observed sample, whatever the set of its observed features are. The resulting model is an original mix of representation learning and of reinforcement learning ideas. It is learned with policy gradient techniques to minimize a budgeted inference cost. We demonstrate the effectiveness of our proposed method with several experiments on a variety of datasets for the sparse prediction problem where all features have the same cost, but also for some cost-sensitive settings.



This article has been supported within the Labex SMART supported by French state funds managed by the ANR within the Investissements d’Avenir programme under reference ANR-11-LABX-65. Part of this work has benefited from a grant from program DGA-RAPID, project LuxidX.


  1. 1.
    Benbouzid, D., Busa-Fekete, R., Kégl, B.: Fast classification using sparse decision dags. In: ICML (2012)Google Scholar
  2. 2.
    Bi, J., Bennett, K., Embrechts, M., Breneman, C., Song, M.: Dimensionality reduction via sparse support vector machines. JMLR 3, 1229–1243 (2003)zbMATHGoogle Scholar
  3. 3.
    Bilgic, M., Getoor, L.: Voila: efficient feature-value acquisition for classification. In: Proceedings of AAAI, vol. 22, p. 1225 (2007)Google Scholar
  4. 4.
    Chai, X., Deng, L., Yang, Q., Ling, C.X.: Test-cost sensitive naive bayes classification. In: Data Mining, ICDM 2004 (2004)Google Scholar
  5. 5.
    Chen, M., Weinberger, K.Q., Chapelle, O., Kedem, D., Xu, Z.: Classifier cascade for minimizing feature evaluation cost. In: AISTATS, pp. 218–226 (2012)Google Scholar
  6. 6.
    Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint (2014). arXiv:1409.1259
  7. 7.
    Dulac-Arnold, G., Denoyer, L., Preux, P., Gallinari, P.: Sequential approaches for learning datum-wise sparse representations. Mach. Learn. 89(1–2), 87–122 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. JMLR 3, 1157–1182 (2003)zbMATHGoogle Scholar
  9. 9.
    He, H., Daumé III., H., Eisner, J.: Cost-sensitive dynamic feature selection. In: ICML Workshop: Interactions Between Inference and Learning, Edinburgh (2012)Google Scholar
  10. 10.
    Ji, S., Carin, L.: Cost-sensitive feature acquisition and classification. Pattern Recogn. 40(5), 1474–1485 (2007)CrossRefzbMATHGoogle Scholar
  11. 11.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)CrossRefzbMATHGoogle Scholar
  12. 12.
    Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: NIPS (2014)Google Scholar
  13. 13.
    Raykar, V.C., Krishnapuram, B., Yu, S.: Designing efficient cascaded classifiers: tradeoff between accuracy and cost. In: 16th ACM SIGKDD (2010)Google Scholar
  14. 14.
    Trapeznikov, K., Saligrama, V.: Supervised sequential classification under budget constraints. In: AISTATS (2013)Google Scholar
  15. 15.
    Turney, P.D.: Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. J. Artif. Intell. Res. 2, 369–409 (1995)Google Scholar
  16. 16.
    Viola, P., Jones, M.: Robust real-time object detection. Int. J. Comput. Vis. 4, 51–52 (2001)Google Scholar
  17. 17.
    Weiss, D.J., Taskar, B.: Learning adaptive value of information for structured prediction. In: NIPS (2013)Google Scholar
  18. 18.
    Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.: Use of the zero norm with linear models and kernel methods. JMLR 3, 1439–1461 (2003)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature selection for SVMS. In: NIPS (2000)Google Scholar
  20. 20.
    Wierstra, D., Foerster, A., Peters, J., Schmidhuber, J.: Solving deep memory POMDPs with recurrent policy gradients. In: Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds.) ICANN 2007. LNCS, vol. 4669, pp. 697–706. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-74690-4_71 CrossRefGoogle Scholar
  21. 21.
    Xu, Z., Huang, G., Weinberger, K.Q., Zheng, A.X.: Gradient boosted feature selection. In: ACM SIGKDD (2014)Google Scholar
  22. 22.
    Xu, Z., Kusner, M.J., Weinberger, K.Q., Chen, M., Chapelle, O.: Classifier cascades and trees for minimizing feature evaluation cost. JMLR 15(1), 2113–2144 (2014)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Xu, Z., Weinberger, K., Chapelle, O.: The greedy miser: learning under test-time budgets. arXiv preprint (2012). arXiv:1206.6451

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Gabriella Contardo
    • 1
    Email author
  • Ludovic Denoyer
    • 1
  • Thierry Artières
    • 2
  1. 1.Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6ParisFrance
  2. 2.Ecole Centrale Marseille-Laboratoire d’Informatique Fondamentale (Aix-Marseille University)MarseilleFrance

Personalised recommendations