A Bayesian Posterior Updating Algorithm in Reinforcement Learning
Bayesian reinforcement learning (BRL) is an important approach to reinforcement learning (RL) that takes full advantage of methods from Bayesian inference to incorporate prior information into the learning process when the agent interacts directly with environment without depending on exemplary supervision or complete models of the environment. BRL tackles the problem by expressing prior information in a probabilistic distribution to quantify the uncertainty, and updates these distributions when the evidences are collected. However, the expected total discounted rewards cannot be obtained instantly to maintain these distributions after each transition the agent executes. In this paper, we propose a novel idea to adjust immediate rewards slightly in the process of Bayesian Q-learning updating by introducing a state pool technique which could improve total rewards that accrue over a period of time when this pool resets appropriately. We show experimentally on several fundamental BRL problems that the proposed method can perform substantial improvements over other traditional strategies.
KeywordsBayesian reinforcement learning Bayesian Q-learning State pool technique
This work is partly supported by NSFC grants 61375005, U1613213, 61210009, MOST grants 2015BAK35B00, 2015BAK35B01, Guangdong Science and Technology Department grant 2016B090910001.
- 1.Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)Google Scholar
- 2.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (1998)Google Scholar
- 5.Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: The Association for the Advancement of Artificial Intelligence, pp. 761–768 (1998)Google Scholar
- 6.Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: Proceedings of the 22nd international conference on Machine learning, pp. 956–963 (2005)Google Scholar
- 8.Chapelle, O., Li, L.: An empirical evaluation of Thompson sampling. In: Advances in neural information processing systems, pp. 2249–2257 (2011)Google Scholar
- 9.Strens, M.: A Bayesian framework for reinforcement learning. In: International Conference on Machine Learning, pp. 943–950 (2000)Google Scholar