Policy Search in a Space of Simple Closed-form Formulas: Towards Interpretability of Reinforcement Learning

Maes, Francis; Fonteneau, Raphael; Wehenkel, Louis; Ernst, Damien

doi:10.1007/978-3-642-33492-4_6

Policy Search in a Space of Simple Closed-form Formulas: Towards Interpretability of Reinforcement Learning

Francis Maes²²,
Raphael Fonteneau²²,
Louis Wehenkel²² &
…
Damien Ernst²²

Conference paper

989 Accesses
10 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7569))

Abstract

In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple closed-form formulas that are used to rank actions. We formalize the search for a high-performance policy as a multi-armed bandit problem where each arm corresponds to a candidate policy canonically represented by its shortest formula-based representation. Experiments, conducted on standard benchmarks, show that this approach manages to determine both efficient and interpretable solutions.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adams, B., Banks, H., Kwon, H.D., Tran, H.: Dynamic multidrug therapies for HIV: Optimal and STI approaches. Mathematical Biosciences and Engineering 1, 223–241 (2004)
Article MathSciNet MATH Google Scholar
Audibert, J.-Y., Munos, R., Szepesvári, C.: Tuning Bandit Algorithms in Stochastic Environments. In: Hutter, M., Servedio, R.A., Takimoto, E. (eds.) ALT 2007. LNCS (LNAI), vol. 4754, pp. 150–165. Springer, Heidelberg (2007)
Chapter Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2), 235–256 (2002)
Article MATH Google Scholar
Bar-Gad, I., Morris, G., Bergman, H.: Information processing, dimensionality reduction and reinforcement learning in the basal ganglia. Progress in Neurobiology 71(6), 439–473 (2003)
Article Google Scholar
Barron, A.R., Cover, T.M.: Minimum complexity density estimation. IEEE Transactions on Information Theory 37(4), 1034–1054 (1991)
Article MathSciNet MATH Google Scholar
Busoniu, L., Babuska, R., De Schutter, B., Ernst, D.: Reinforcement Learning and Dynamic Programming using Function Approximators. Taylor & Francis, CRC Press (2010)
Google Scholar
Castelletti, A., Galelli, S., Restelli, M., Soncini-Sessa, R.: Tree-based variable selection for dimensionality reduction of large-scale control systems. In: Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 62–69. IEEE (2011)
Google Scholar
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)
MathSciNet MATH Google Scholar
Fonteneau, R., Wehenkel, L., Ernst, D.: Variable selection for dynamic treatment regimes: a reinforcement learning approach. In: European Workshop on Reinforcement Learning, EWRL (2008)
Google Scholar
Gearhart, C.: Genetic programming as policy search in markov decision processes. In: Genetic Algorithms and Genetic Programming at Stanford, pp. 61–67 (2003)
Google Scholar
Girgin, S., Preux, P.: Feature Discovery in Reinforcement Learning Using Genetic Programming. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcázar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 218–229. Springer, Heidelberg (2008)
Chapter Google Scholar
Goldberg, D.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-wesley (1989)
Google Scholar
Guez, A., Vincent, R., Avoli, M., Pineau, J.: Adaptive treatment of epilepsy via batch-mode reinforcement learning. In: Innovative Applications of Artificial Intelligence (IAAI), pp. 1671–1678 (2008)
Google Scholar
Gunter, L., Zhu, J., Murphy, S.: Variable Selection for Optimal Decision Making. In: Bellazzi, R., Abu-Hanna, A., Hunter, J. (eds.) AIME 2007. LNCS (LNAI), vol. 4594, pp. 149–154. Springer, Heidelberg (2007)
Chapter Google Scholar
Hren, J.-F., Munos, R.: Optimistic Planning of Deterministic Systems. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds.) EWRL 2008. LNCS (LNAI), vol. 5323, pp. 151–164. Springer, Heidelberg (2008)
Chapter Google Scholar
Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2005)
MATH Google Scholar
Ingersoll, J.: Theory of Financial Decision Making. Rowman and Littlefield Publishers, Inc. (1987)
Google Scholar
Kolmogorov, A.N.: Three approaches to the quantitative definition of information. Problems of Information Transmission 1(1), 1–7 (1965)
MathSciNet Google Scholar
Maes, F., Wehenkel, L., Ernst, D.: Automatic Discovery of Ranking Formulas for Playing with Multi-armed Bandits. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 5–17. Springer, Heidelberg (2012)
Chapter Google Scholar
Maes, F., Wehenkel, L., Ernst, D.: Optimized Look-ahead Tree Search Policies. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 189–200. Springer, Heidelberg (2012)
Chapter Google Scholar
Moore, A., Atkeson, C.: The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. Machine Learning 21(3), 199–233 (1995)
Google Scholar
Murphy, S.: Optimal dynamic treatment regimes. Journal of the Royal Statistical Society, Series B 65(2), 331–366 (2003)
Article MathSciNet MATH Google Scholar
Randløv, J., Alstrøm, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pp. 463–471. Citeseer (1998)
Google Scholar
Riedmiller, M.: Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005)
Chapter Google Scholar
Rubinstein, R., Kroese, D.: The Cross-Entropy Method. A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. Information Science and Statistics. Springer (2004)
Google Scholar
Rüping, S.: Learning Interpretable Models. Ph.D. thesis (2006)
Google Scholar
Stanley, K., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evolutionary Computation 10(2), 99–127 (2002)
Article Google Scholar
Wehenkel, L.: Automatic Learning Techniques in Power Systems. Kluwer Academic Publishers, Boston (1998)
Book MATH Google Scholar
Yoshimoto, J., Ishii, S., Sato, M.: Application of reinforcement learning to balancing of acrobot. In: Systems, Man, and Cybernetics Conference Proceedings, vol. 5, pp. 516–521. IEEE (1999)
Google Scholar
Zhao, Y., Kosorok, M., Zeng, D.: Reinforcement learning design for cancer clinical trials. Statistics in Medicine 28, 3294–3315 (2009)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, University of Liège, Belgium
Francis Maes, Raphael Fonteneau, Louis Wehenkel & Damien Ernst

Authors

Francis Maes
View author publications
You can also search for this author in PubMed Google Scholar
Raphael Fonteneau
View author publications
You can also search for this author in PubMed Google Scholar
Louis Wehenkel
View author publications
You can also search for this author in PubMed Google Scholar
Damien Ernst
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIP6 laboratory, Parix VI University, 4, place Jussieu, 75005, Paris, France
Jean-Gabriel Ganascia
Telecom Bretagne, UMR 6285 Lab-STICC, Technopôle Brest-Iroise - CS 83818, 29238, Brest Cedex 3, France
Philippe Lenca
INSA Lyon, UMR 5205 LIRIS, 7, avenue Jean Capelle, 69621, Villeurbanne Cedex, France
Jean-Marc Petit

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maes, F., Fonteneau, R., Wehenkel, L., Ernst, D. (2012). Policy Search in a Space of Simple Closed-form Formulas: Towards Interpretability of Reinforcement Learning. In: Ganascia, JG., Lenca, P., Petit, JM. (eds) Discovery Science. DS 2012. Lecture Notes in Computer Science(), vol 7569. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33492-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-33492-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33491-7
Online ISBN: 978-3-642-33492-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics