Advertisement

A Kernel-Based Sarsa(\(\lambda \)) Algorithm with Clustering-Based Sample Sparsification

  • Haijun Zhu
  • Fei Zhu
  • Yuchen FuEmail author
  • Quan Liu
  • Jianwei Zhai
  • Cijia Sun
  • Peng Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9949)

Abstract

In the past several decades, as a significant class of solutions to the large scale or continuous space control problems, kernel-based reinforcement learning (KBRL) methods have been a research hotspot. While the existing sample sparsification methods of KBRL exist the problems of low time efficiency and poor effect. For this problem, we propose a new sample sparsification method, clustering-based novelty criterion (CNC), which combines a clustering algorithm with a distance-based novelty criterion. Besides, we propose a clustering-based selective kernel Sarsa(\(\lambda \)) (CSKS(\(\lambda \))) on the basis of CNC, which applies Sarsa(\(\lambda \)) to learning parameters of the selective kernel-based value function based on local validity. Finally, we illustrate that our CSKS(\(\lambda \)) surpasses other state-of-the-art algorithms by Acrobot experiment.

Keywords

Reinforcement learning Kernel method Sample sparsification Clustering Sarsa(\(\lambda \)

Notes

Acknowledgement

This work was funded by National Science Foundation of China (61303108, 61373094, 61472262), Natural Science Foundation of Jiangsu (BK2012616), High School Natural Foundation of Jiangsu (13KJB520020), Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172014K04), Suzhou Industrial application of basic research program part (SYG201422), Provincial Key Laboratory for Computer Information Processing Technology, Soochow University (KJS1524).

References

  1. 1.
    Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)zbMATHGoogle Scholar
  2. 2.
    Chen, X., Gao, Y., Wang, R.: Online selective kernel-based temporal difference learning. IEEE Trans. Neural Netw. Learn. Syst. 24(12), 1944–1956 (2013)CrossRefGoogle Scholar
  3. 3.
    Engel, Y., Mannor, S., Meir, R.: The kernel recursive least-squares algorithm. IEEE Trans. Signal Process. 52(8), 2275–2285 (2004)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. J. Mach. Learn. Res. 6(2), 503–556 (2005)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Fan-Cheng, M., Ya-Ping, D.: Reinforcement learning adaptive control for upper limb rehabilitation robot based on fuzzy neural network. In: Control Conference (CCC), pp. 5157–5161 (2012)Google Scholar
  6. 6.
    Ormoneit, D., Sen, Ś.: Kernel-based reinforcement learning. Mach. Learn. 49(2–3), 161–178 (2002)CrossRefzbMATHGoogle Scholar
  7. 7.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (2014)zbMATHGoogle Scholar
  8. 8.
    van Seijen, H., Sutton, R.S.: True online TD(\(\lambda \)). In: Proceedings of the 31st International Conference on Machine Learning (ICML), pp. 692–700 (2014)Google Scholar
  9. 9.
    Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Driessche, G.V.D., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)CrossRefGoogle Scholar
  10. 10.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  11. 11.
    Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Xiaoyang, T., Songcan, C., Zhi-Hua, Z., Fuyan, Z.: Recognizing partially occluded, expression variant faces from single training image per person with som and soft \(k\)-NN ensemble. IEEE Trans. Neural Netw. 16(4), 875–886 (2005)CrossRefGoogle Scholar
  13. 13.
    Xu, X., Huang, Z., Graves, D., Pedrycz, W.: A clustering-based graph Laplacian framework for value function approximation in reinforcement learning. IEEE Trans. Cybern. 44(12), 2613–2625 (2014)CrossRefGoogle Scholar
  14. 14.
    Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Netw. 18(4), 973–992 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Haijun Zhu
    • 1
  • Fei Zhu
    • 1
    • 2
  • Yuchen Fu
    • 1
    • 3
    Email author
  • Quan Liu
    • 1
  • Jianwei Zhai
    • 1
  • Cijia Sun
    • 1
  • Peng Zhang
    • 1
  1. 1.School of Computer Science and TechnologySoochow UniversitySuzhouChina
  2. 2.Provincial Key Laboratory for Computer Information Processing TechnologySoochow UniversitySuzhouChina
  3. 3.School of Computer Science and EngineeringChangshu Institute of TechnologyChangshuChina

Personalised recommendations