Sparse Kernel-Based Least Squares Temporal Difference with Prioritized Sweeping
How to improve the efficiency of the algorithms to solve the large scale or continuous space reinforcement learning (RL) problems has been a hot research. Kernel-based least squares temporal difference(KLSTD) algorithm can solve continuous space RL problems. But it has the problem of high computational complexity because of kernel-based and complex matrix computation. For the problem, this paper proposes an algorithm named sparse kernel-based least squares temporal difference with prioritized sweeping (PS-SKLSTD). PS-SKLSTD consists of two parts: learning and planning. In the learning process, we exploit the ALD-based sparse kernel function to represent value function and update the parameter vectors based on the Sherman-Morrison equation. In the planning process, we use prioritized sweeping method to select the current updated state-action pair. The experimental results demonstrate that PS-SKLSTD has better performance on convergence and calculation efficiency than KLSTD.
KeywordsReinforcement learning Prioritized sweeping Sparse kernel Least squares temporal difference
This paper was funded by National Natural Science Foundation (61103045, 61272005, 61272244, 61303108, 61373094, 61472262). Natural Science Foundation of Jiangsu (BK2012616), High School Natural Foundation of Jiangsu (13KJB520020), Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172014K04), Suzhou Industrial application of basic research program part (SYG201422).
- 1.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
- 3.Wiering, M., van Otterlo, M.: Reinforcement learning: state-of-the-art. Phillip Journal Fr Restaurative Zahnmedizin (2012)Google Scholar
- 4.van Hasselt, H.: Reinforcement learning in continuous state and action spaces. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. ALO, vol. 12, pp. 205–248. Springer, Heidelberg (2012)Google Scholar
- 5.Xu, X., Xie, X., Hu, D.: Kernel least-squares temporal difference learning. Int. J. Inf. Technol. 11(9), 54–63 (2005)Google Scholar
- 7.Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13(1), 103–130 (1993)Google Scholar
- 8.Sutton, R.S., Szepesvari, C., Geramifard, A., Bowling, M.P.: Dyna-style planning with linear function approximation and prioritized sweeping. In: Conference on Uncertainty in Artificial Intelligence (2008)Google Scholar
- 12.Jong, N., Stone, P.: Kernel-based models for reinforcement learning. In: ICML Workshop on Kernel Machines and Reinforcement Learning (2006)Google Scholar
- 13.Engel, Y., Mannor, S., Meir, R.: Bayes meets Bellman: the Gaussian process approach to temporal difference learning. In: 20th International Conference on Machine Learning, pp. 154–161. American Association for Artificial Intelligence (2003)Google Scholar