Advertisement

A Bayesian Posterior Updating Algorithm in Reinforcement Learning

  • Fangzhou Xiong
  • Zhiyong LiuEmail author
  • Xu Yang
  • Biao Sun
  • Charles Chiu
  • Hong Qiao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10638)

Abstract

Bayesian reinforcement learning (BRL) is an important approach to reinforcement learning (RL) that takes full advantage of methods from Bayesian inference to incorporate prior information into the learning process when the agent interacts directly with environment without depending on exemplary supervision or complete models of the environment. BRL tackles the problem by expressing prior information in a probabilistic distribution to quantify the uncertainty, and updates these distributions when the evidences are collected. However, the expected total discounted rewards cannot be obtained instantly to maintain these distributions after each transition the agent executes. In this paper, we propose a novel idea to adjust immediate rewards slightly in the process of Bayesian Q-learning updating by introducing a state pool technique which could improve total rewards that accrue over a period of time when this pool resets appropriately. We show experimentally on several fundamental BRL problems that the proposed method can perform substantial improvements over other traditional strategies.

Keywords

Bayesian reinforcement learning Bayesian Q-learning State pool technique 

Notes

Acknowledgments

This work is partly supported by NSFC grants 61375005, U1613213, 61210009, MOST grants 2015BAK35B00, 2015BAK35B01, Guangdong Science and Technology Department grant 2016B090910001.

References

  1. 1.
    Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)Google Scholar
  2. 2.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (1998)Google Scholar
  3. 3.
    Ghavamzadeh, M., Mannor, S., Pineau, J., Tamar, A.: Bayesian reinforcement learning: a survey. Found. Trends? Mach. Learn. 8(5–6), 359–483 (2015)CrossRefzbMATHGoogle Scholar
  4. 4.
    Vlassis, N., Ghavamzadeh, M., Mannor, S., Poupart, P.: Bayesian reinforcement learning. Reinforcement Learning 12, 359–386 (2012)CrossRefGoogle Scholar
  5. 5.
    Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: The Association for the Advancement of Artificial Intelligence, pp. 761–768 (1998)Google Scholar
  6. 6.
    Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: Proceedings of the 22nd international conference on Machine learning, pp. 956–963 (2005)Google Scholar
  7. 7.
    Brafman, R.I., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3(Oct), 213–231 (2002)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Chapelle, O., Li, L.: An empirical evaluation of Thompson sampling. In: Advances in neural information processing systems, pp. 2249–2257 (2011)Google Scholar
  9. 9.
    Strens, M.: A Bayesian framework for reinforcement learning. In: International Conference on Machine Learning, pp. 943–950 (2000)Google Scholar
  10. 10.
    Castronovo, M., Ernst, D., Couëtoux, A., Fonteneau, R.: Benchmarking for Bayesian reinforcement learning. PloS One 11(6), e0157088 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Fangzhou Xiong
    • 1
    • 2
  • Zhiyong Liu
    • 1
    • 2
    • 3
    • 5
    Email author
  • Xu Yang
    • 1
  • Biao Sun
    • 4
  • Charles Chiu
    • 6
  • Hong Qiao
    • 1
    • 2
    • 3
    • 4
    • 5
  1. 1.The State Key Lab of Management and Control for Complex SystemsInstitute of Automation, Chinese Academy of ScienceBeijingChina
  2. 2.School of Computer and ControlUniversity of Chinese Academy of Sciences (UCAS)BeijingChina
  3. 3.CAS Centre for Excellence in Brain Science and Intelligence Technology (CEBSIT)ShanghaiChina
  4. 4.University of Science and Technology BeijingBeijingChina
  5. 5.Cloud Computing CenterChinese Academy of SciencesDongGuanChina
  6. 6.School for Higher and Professional EducationChai Wan, Hong KongChina

Personalised recommendations