Robust Bayesian Reinforcement Learning through Tight Lower Bounds

Dimitrakakis, Christos

doi:10.1007/978-3-642-29946-9_19

Christos Dimitrakakis²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7188))

Included in the following conference series:

European Workshop on Reinforcement Learning

2239 Accesses
2 Citations

Abstract

In the Bayesian approach to sequential decision making, exact calculation of the (subjective) utility is intractable. This extends to most special cases of interest, such as reinforcement learning problems. While utility bounds are known to exist for this problem, so far none of them were particularly tight. In this paper, we show how to efficiently calculate a lower bound, which corresponds to the utility of a near-optimal memoryless policy for the decision problem, which is generally different from both the Bayes-optimal policy and the policy which is optimal for the expected MDP under the current belief. We then show how these can be applied to obtain robust exploration policies in a Bayesian reinforcement learning setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning (ICML 2004) (2004)
Google Scholar
Asmuth, J., Li, L., Littman, M.L., Nouri, A., Wingate, D.: A Bayesian sampling approach to exploration in reinforcement learning. In: UAI 2009 (2009)
Google Scholar
Auer, P., Jaksch, T., Ortner, R.: Near-optimal regret bounds for reinforcement learning. In: Proceedings of NIPS 2008 (2008)
Google Scholar
Brafman, R.I., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. The Journal of Machine Learning Research 3, 213–231 (2003)
MathSciNet MATH Google Scholar
Brown, D.B., Smith, J.E., Sun, P.: Information relaxations and duality in stochastic dynamic programs. Operations Research 58(4), 785–801 (2010)
Article MathSciNet MATH Google Scholar
Castro, P.S., Precup, D.: Smarter Sampling in Model-Based Bayesian Reinforcement Learning. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS, vol. 6321, pp. 200–214. Springer, Heidelberg (2010)
Chapter Google Scholar
de Farias, D.P., Van Roy, B.: The linear programming approach to approximate dynamic programming. Operations Research 51(6), 850–865 (2003)
Article MathSciNet MATH Google Scholar
de Farias, D.P., Van Roy, B.: On constraint sampling in the linear programming approach to approximate dynamic programming. Mathematics of Operations Research 293(3), 462–478 (2004)
Article Google Scholar
Dearden, R., Friedman, N., Russell, S.J.: Bayesian Q-learning. In: AAAI/IAAI, pp. 761–768 (1998)
Google Scholar
Dearden, R., Friedman, N., Andre, D.: Model based Bayesian exploration. In: Laskey, K.B., Prade, H. (eds.) Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI 1999), July 30-August 1, pp. 150–159. Morgan Kaufmann, San Francisco (1999)
Google Scholar
DeGroot, M.H.: Optimal Statistical Decisions. John Wiley & Sons (1970)
Google Scholar
Dimitrakakis, C.: Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning. In: 2nd International Conference on Agents and Artificial Intelligence (ICAART 2010), Valencia, Spain, pp. 259–264. ISNTICC, Springer (2009)
Google Scholar
Dimitrakakis, C., Rothkopf, C.A.: Bayesian multitask inverse reinforcement learning. In: European Workshop on Reinforcement Learning, EWRL 2011 (2011)
Google Scholar
Duff, M.O.: Optimal Learning Computational Procedures for Bayes-adaptive Markov Decision Processes. PhD thesis, University of Massachusetts at Amherst (2002)
Google Scholar
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Monographs on Statistics & Applied Probability, vol. 57. Chapmann & Hall, ISBN 0412042312 (November 1993)
Google Scholar
Fard, M.M., Pineau, J.: PAC-Bayesian model selection for reinforcement learning. In: NIPS 2010 (2010)
Google Scholar
Furmston, T., Barber, D.: Variational methods for reinforcement learning. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS). JMLR: W&CP, vol. 9, pp. 241–248
Google Scholar
Gittins, C.J.: Multi-armed Bandit Allocation Indices. John Wiley & Sons, New Jersey (1989)
MATH Google Scholar
Jacksh, T., Ortner, R., Auer, P.: Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research 11, 1563–1600 (2010)
Google Scholar
Kaelbling, L.P.: Learning in Embedded Systems. PhD thesis, ept of Computer Science, Stanford (1990)
Google Scholar
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: Proc. 15th International Conf. on Machine Learning, pp. 260–268. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: Proc. 17th International Conf. on Machine Learning, pp. 663–670. Morgan Kaufmann (2000)
Google Scholar
Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete Bayesian reinforcement learning. In: ICML 2006, pp. 697–704. ACM Press, New York (2006)
Chapter Google Scholar
Rogers, L.C.G.: Pathwise stochastic optimal control. SIAM Journal on Control and Optimization 46(3), 1116–1132 (2008)
Article Google Scholar
Rothkopf, C.A., Dimitrakakis, C.: Preference Elicitation and Inverse Reinforcement Learning. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS, vol. 6913, pp. 34–48. Springer, Heidelberg (2011)
Chapter Google Scholar
Snel, M., Whiteson, S.: Multi-Task Reinforcement Learning: Shaping and Feature Selection. In: EWRL 2011 (2011)
Google Scholar
Strehl, A.L., Littman, M.L.: An analysis of model-based interval estimation for Markov decision processes. Journal of Computer and System Sciences 74(8), 1309–1331 (2008)
Article MathSciNet MATH Google Scholar
Strehl, A.L., Li, L., Littman, M.L.: Reinforcement learning in finite MDPs: PAC analysis. The Journal of Machine Learning Research 10, 2413–2444 (2009)
MathSciNet MATH Google Scholar
Strens, M.: A bayesian framework for reinforcement learning. In: ICML 2000, pp. 943–950. Citeseer (2000)
Google Scholar
Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: ICML 2005, pp. 956–963. ACM, New York (2005)
Chapter Google Scholar
Wyatt, J.: Exploration control in reinforcement learning using optimistic model selection. In: Danyluk, A., Brodley, C. (eds.) Proceedings of the Eighteenth International Conference on Machine Learning (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

EPFL, Lausanne, Switzerland
Christos Dimitrakakis

Authors

Christos Dimitrakakis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA and the Australian National University, 7 London Circuit, ACT 2601, Canberra, Australia
Scott Sanner
Research School of Computer Science, Australian National University, ACT 0200, Canberra, Australia
Marcus Hutter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dimitrakakis, C. (2012). Robust Bayesian Reinforcement Learning through Tight Lower Bounds. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-29946-9_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29945-2
Online ISBN: 978-3-642-29946-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics