Skip to main content

Robust Bayesian Reinforcement Learning through Tight Lower Bounds

  • Conference paper
Recent Advances in Reinforcement Learning (EWRL 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7188))

Included in the following conference series:

Abstract

In the Bayesian approach to sequential decision making, exact calculation of the (subjective) utility is intractable. This extends to most special cases of interest, such as reinforcement learning problems. While utility bounds are known to exist for this problem, so far none of them were particularly tight. In this paper, we show how to efficiently calculate a lower bound, which corresponds to the utility of a near-optimal memoryless policy for the decision problem, which is generally different from both the Bayes-optimal policy and the policy which is optimal for the expected MDP under the current belief. We then show how these can be applied to obtain robust exploration policies in a Bayesian reinforcement learning setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning (ICML 2004) (2004)

    Google Scholar 

  2. Asmuth, J., Li, L., Littman, M.L., Nouri, A., Wingate, D.: A Bayesian sampling approach to exploration in reinforcement learning. In: UAI 2009 (2009)

    Google Scholar 

  3. Auer, P., Jaksch, T., Ortner, R.: Near-optimal regret bounds for reinforcement learning. In: Proceedings of NIPS 2008 (2008)

    Google Scholar 

  4. Brafman, R.I., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. The Journal of Machine Learning Research 3, 213–231 (2003)

    MathSciNet  MATH  Google Scholar 

  5. Brown, D.B., Smith, J.E., Sun, P.: Information relaxations and duality in stochastic dynamic programs. Operations Research 58(4), 785–801 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  6. Castro, P.S., Precup, D.: Smarter Sampling in Model-Based Bayesian Reinforcement Learning. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS, vol. 6321, pp. 200–214. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. de Farias, D.P., Van Roy, B.: The linear programming approach to approximate dynamic programming. Operations Research 51(6), 850–865 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  8. de Farias, D.P., Van Roy, B.: On constraint sampling in the linear programming approach to approximate dynamic programming. Mathematics of Operations Research 293(3), 462–478 (2004)

    Article  Google Scholar 

  9. Dearden, R., Friedman, N., Russell, S.J.: Bayesian Q-learning. In: AAAI/IAAI, pp. 761–768 (1998)

    Google Scholar 

  10. Dearden, R., Friedman, N., Andre, D.: Model based Bayesian exploration. In: Laskey, K.B., Prade, H. (eds.) Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI 1999), July  30-August 1, pp. 150–159. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  11. DeGroot, M.H.: Optimal Statistical Decisions. John Wiley & Sons (1970)

    Google Scholar 

  12. Dimitrakakis, C.: Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning. In: 2nd International Conference on Agents and Artificial Intelligence (ICAART 2010), Valencia, Spain, pp. 259–264. ISNTICC, Springer (2009)

    Google Scholar 

  13. Dimitrakakis, C., Rothkopf, C.A.: Bayesian multitask inverse reinforcement learning. In: European Workshop on Reinforcement Learning, EWRL 2011 (2011)

    Google Scholar 

  14. Duff, M.O.: Optimal Learning Computational Procedures for Bayes-adaptive Markov Decision Processes. PhD thesis, University of Massachusetts at Amherst (2002)

    Google Scholar 

  15. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Monographs on Statistics & Applied Probability, vol. 57. Chapmann & Hall, ISBN 0412042312 (November 1993)

    Google Scholar 

  16. Fard, M.M., Pineau, J.: PAC-Bayesian model selection for reinforcement learning. In: NIPS 2010 (2010)

    Google Scholar 

  17. Furmston, T., Barber, D.: Variational methods for reinforcement learning. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS). JMLR: W&CP, vol. 9, pp. 241–248

    Google Scholar 

  18. Gittins, C.J.: Multi-armed Bandit Allocation Indices. John Wiley & Sons, New Jersey (1989)

    MATH  Google Scholar 

  19. Jacksh, T., Ortner, R., Auer, P.: Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research 11, 1563–1600 (2010)

    Google Scholar 

  20. Kaelbling, L.P.: Learning in Embedded Systems. PhD thesis, ept of Computer Science, Stanford (1990)

    Google Scholar 

  21. Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: Proc. 15th International Conf. on Machine Learning, pp. 260–268. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  22. Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: Proc. 17th International Conf. on Machine Learning, pp. 663–670. Morgan Kaufmann (2000)

    Google Scholar 

  23. Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete Bayesian reinforcement learning. In: ICML 2006, pp. 697–704. ACM Press, New York (2006)

    Chapter  Google Scholar 

  24. Rogers, L.C.G.: Pathwise stochastic optimal control. SIAM Journal on Control and Optimization 46(3), 1116–1132 (2008)

    Article  Google Scholar 

  25. Rothkopf, C.A., Dimitrakakis, C.: Preference Elicitation and Inverse Reinforcement Learning. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS, vol. 6913, pp. 34–48. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  26. Snel, M., Whiteson, S.: Multi-Task Reinforcement Learning: Shaping and Feature Selection. In: EWRL 2011 (2011)

    Google Scholar 

  27. Strehl, A.L., Littman, M.L.: An analysis of model-based interval estimation for Markov decision processes. Journal of Computer and System Sciences 74(8), 1309–1331 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  28. Strehl, A.L., Li, L., Littman, M.L.: Reinforcement learning in finite MDPs: PAC analysis. The Journal of Machine Learning Research 10, 2413–2444 (2009)

    MathSciNet  MATH  Google Scholar 

  29. Strens, M.: A bayesian framework for reinforcement learning. In: ICML 2000, pp. 943–950. Citeseer (2000)

    Google Scholar 

  30. Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: ICML 2005, pp. 956–963. ACM, New York (2005)

    Chapter  Google Scholar 

  31. Wyatt, J.: Exploration control in reinforcement learning using optimistic model selection. In: Danyluk, A., Brodley, C. (eds.) Proceedings of the Eighteenth International Conference on Machine Learning (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dimitrakakis, C. (2012). Robust Bayesian Reinforcement Learning through Tight Lower Bounds. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29946-9_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29945-2

  • Online ISBN: 978-3-642-29946-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics