Abstract
In the past several decades, as a significant class of solutions to the large scale or continuous space control problems, kernel-based reinforcement learning (KBRL) methods have been a research hotspot. While the existing sample sparsification methods of KBRL exist the problems of low time efficiency and poor effect. For this problem, we propose a new sample sparsification method, clustering-based novelty criterion (CNC), which combines a clustering algorithm with a distance-based novelty criterion. Besides, we propose a clustering-based selective kernel Sarsa(\(\lambda \)) (CSKS(\(\lambda \))) on the basis of CNC, which applies Sarsa(\(\lambda \)) to learning parameters of the selective kernel-based value function based on local validity. Finally, we illustrate that our CSKS(\(\lambda \)) surpasses other state-of-the-art algorithms by Acrobot experiment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The matrix \(N\ge M\) and \(\mathscr {K}=[\varvec{k}(s_1),\varvec{k}(s_2),...,\varvec{k}(s_N)]\) has full rank, where N is the size of the state-action space.
References
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Chen, X., Gao, Y., Wang, R.: Online selective kernel-based temporal difference learning. IEEE Trans. Neural Netw. Learn. Syst. 24(12), 1944–1956 (2013)
Engel, Y., Mannor, S., Meir, R.: The kernel recursive least-squares algorithm. IEEE Trans. Signal Process. 52(8), 2275–2285 (2004)
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. J. Mach. Learn. Res. 6(2), 503–556 (2005)
Fan-Cheng, M., Ya-Ping, D.: Reinforcement learning adaptive control for upper limb rehabilitation robot based on fuzzy neural network. In: Control Conference (CCC), pp. 5157–5161 (2012)
Ormoneit, D., Sen, Ś.: Kernel-based reinforcement learning. Mach. Learn. 49(2–3), 161–178 (2002)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (2014)
van Seijen, H., Sutton, R.S.: True online TD(\(\lambda \)). In: Proceedings of the 31st International Conference on Machine Learning (ICML), pp. 692–700 (2014)
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Driessche, G.V.D., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1996)
Xiaoyang, T., Songcan, C., Zhi-Hua, Z., Fuyan, Z.: Recognizing partially occluded, expression variant faces from single training image per person with som and soft \(k\)-NN ensemble. IEEE Trans. Neural Netw. 16(4), 875–886 (2005)
Xu, X., Huang, Z., Graves, D., Pedrycz, W.: A clustering-based graph Laplacian framework for value function approximation in reinforcement learning. IEEE Trans. Cybern. 44(12), 2613–2625 (2014)
Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Netw. 18(4), 973–992 (2007)
Acknowledgement
This work was funded by National Science Foundation of China (61303108, 61373094, 61472262), Natural Science Foundation of Jiangsu (BK2012616), High School Natural Foundation of Jiangsu (13KJB520020), Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172014K04), Suzhou Industrial application of basic research program part (SYG201422), Provincial Key Laboratory for Computer Information Processing Technology, Soochow University (KJS1524).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Zhu, H. et al. (2016). A Kernel-Based Sarsa(\(\lambda \)) Algorithm with Clustering-Based Sample Sparsification. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9949. Springer, Cham. https://doi.org/10.1007/978-3-319-46675-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-46675-0_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46674-3
Online ISBN: 978-3-319-46675-0
eBook Packages: Computer ScienceComputer Science (R0)