Skip to main content

Policy Search in a Space of Simple Closed-form Formulas: Towards Interpretability of Reinforcement Learning

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7569))

Abstract

In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple closed-form formulas that are used to rank actions. We formalize the search for a high-performance policy as a multi-armed bandit problem where each arm corresponds to a candidate policy canonically represented by its shortest formula-based representation. Experiments, conducted on standard benchmarks, show that this approach manages to determine both efficient and interpretable solutions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adams, B., Banks, H., Kwon, H.D., Tran, H.: Dynamic multidrug therapies for HIV: Optimal and STI approaches. Mathematical Biosciences and Engineering 1, 223–241 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  2. Audibert, J.-Y., Munos, R., Szepesvári, C.: Tuning Bandit Algorithms in Stochastic Environments. In: Hutter, M., Servedio, R.A., Takimoto, E. (eds.) ALT 2007. LNCS (LNAI), vol. 4754, pp. 150–165. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  3. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2), 235–256 (2002)

    Article  MATH  Google Scholar 

  4. Bar-Gad, I., Morris, G., Bergman, H.: Information processing, dimensionality reduction and reinforcement learning in the basal ganglia. Progress in Neurobiology 71(6), 439–473 (2003)

    Article  Google Scholar 

  5. Barron, A.R., Cover, T.M.: Minimum complexity density estimation. IEEE Transactions on Information Theory 37(4), 1034–1054 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  6. Busoniu, L., Babuska, R., De Schutter, B., Ernst, D.: Reinforcement Learning and Dynamic Programming using Function Approximators. Taylor & Francis, CRC Press (2010)

    Google Scholar 

  7. Castelletti, A., Galelli, S., Restelli, M., Soncini-Sessa, R.: Tree-based variable selection for dimensionality reduction of large-scale control systems. In: Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 62–69. IEEE (2011)

    Google Scholar 

  8. Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)

    MathSciNet  MATH  Google Scholar 

  9. Fonteneau, R., Wehenkel, L., Ernst, D.: Variable selection for dynamic treatment regimes: a reinforcement learning approach. In: European Workshop on Reinforcement Learning, EWRL (2008)

    Google Scholar 

  10. Gearhart, C.: Genetic programming as policy search in markov decision processes. In: Genetic Algorithms and Genetic Programming at Stanford, pp. 61–67 (2003)

    Google Scholar 

  11. Girgin, S., Preux, P.: Feature Discovery in Reinforcement Learning Using Genetic Programming. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcázar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 218–229. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  12. Goldberg, D.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-wesley (1989)

    Google Scholar 

  13. Guez, A., Vincent, R., Avoli, M., Pineau, J.: Adaptive treatment of epilepsy via batch-mode reinforcement learning. In: Innovative Applications of Artificial Intelligence (IAAI), pp. 1671–1678 (2008)

    Google Scholar 

  14. Gunter, L., Zhu, J., Murphy, S.: Variable Selection for Optimal Decision Making. In: Bellazzi, R., Abu-Hanna, A., Hunter, J. (eds.) AIME 2007. LNCS (LNAI), vol. 4594, pp. 149–154. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  15. Hren, J.-F., Munos, R.: Optimistic Planning of Deterministic Systems. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds.) EWRL 2008. LNCS (LNAI), vol. 5323, pp. 151–164. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2005)

    MATH  Google Scholar 

  17. Ingersoll, J.: Theory of Financial Decision Making. Rowman and Littlefield Publishers, Inc. (1987)

    Google Scholar 

  18. Kolmogorov, A.N.: Three approaches to the quantitative definition of information. Problems of Information Transmission 1(1), 1–7 (1965)

    MathSciNet  Google Scholar 

  19. Maes, F., Wehenkel, L., Ernst, D.: Automatic Discovery of Ranking Formulas for Playing with Multi-armed Bandits. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 5–17. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  20. Maes, F., Wehenkel, L., Ernst, D.: Optimized Look-ahead Tree Search Policies. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 189–200. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  21. Moore, A., Atkeson, C.: The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. Machine Learning 21(3), 199–233 (1995)

    Google Scholar 

  22. Murphy, S.: Optimal dynamic treatment regimes. Journal of the Royal Statistical Society, Series B 65(2), 331–366 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  23. Randløv, J., Alstrøm, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pp. 463–471. Citeseer (1998)

    Google Scholar 

  24. Riedmiller, M.: Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  25. Rubinstein, R., Kroese, D.: The Cross-Entropy Method. A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. Information Science and Statistics. Springer (2004)

    Google Scholar 

  26. Rüping, S.: Learning Interpretable Models. Ph.D. thesis (2006)

    Google Scholar 

  27. Stanley, K., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evolutionary Computation 10(2), 99–127 (2002)

    Article  Google Scholar 

  28. Wehenkel, L.: Automatic Learning Techniques in Power Systems. Kluwer Academic Publishers, Boston (1998)

    Book  MATH  Google Scholar 

  29. Yoshimoto, J., Ishii, S., Sato, M.: Application of reinforcement learning to balancing of acrobot. In: Systems, Man, and Cybernetics Conference Proceedings, vol. 5, pp. 516–521. IEEE (1999)

    Google Scholar 

  30. Zhao, Y., Kosorok, M., Zeng, D.: Reinforcement learning design for cancer clinical trials. Statistics in Medicine 28, 3294–3315 (2009)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maes, F., Fonteneau, R., Wehenkel, L., Ernst, D. (2012). Policy Search in a Space of Simple Closed-form Formulas: Towards Interpretability of Reinforcement Learning. In: Ganascia, JG., Lenca, P., Petit, JM. (eds) Discovery Science. DS 2012. Lecture Notes in Computer Science(), vol 7569. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33492-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33492-4_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33491-7

  • Online ISBN: 978-3-642-33492-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics