Skip to main content

A Kernel-Based Sarsa(\(\lambda \)) Algorithm with Clustering-Based Sample Sparsification

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9949))

Included in the following conference series:

Abstract

In the past several decades, as a significant class of solutions to the large scale or continuous space control problems, kernel-based reinforcement learning (KBRL) methods have been a research hotspot. While the existing sample sparsification methods of KBRL exist the problems of low time efficiency and poor effect. For this problem, we propose a new sample sparsification method, clustering-based novelty criterion (CNC), which combines a clustering algorithm with a distance-based novelty criterion. Besides, we propose a clustering-based selective kernel Sarsa(\(\lambda \)) (CSKS(\(\lambda \))) on the basis of CNC, which applies Sarsa(\(\lambda \)) to learning parameters of the selective kernel-based value function based on local validity. Finally, we illustrate that our CSKS(\(\lambda \)) surpasses other state-of-the-art algorithms by Acrobot experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The matrix \(N\ge M\) and \(\mathscr {K}=[\varvec{k}(s_1),\varvec{k}(s_2),...,\varvec{k}(s_N)]\) has full rank, where N is the size of the state-action space.

References

  1. Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)

    MATH  Google Scholar 

  2. Chen, X., Gao, Y., Wang, R.: Online selective kernel-based temporal difference learning. IEEE Trans. Neural Netw. Learn. Syst. 24(12), 1944–1956 (2013)

    Article  Google Scholar 

  3. Engel, Y., Mannor, S., Meir, R.: The kernel recursive least-squares algorithm. IEEE Trans. Signal Process. 52(8), 2275–2285 (2004)

    Article  MathSciNet  Google Scholar 

  4. Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. J. Mach. Learn. Res. 6(2), 503–556 (2005)

    MathSciNet  MATH  Google Scholar 

  5. Fan-Cheng, M., Ya-Ping, D.: Reinforcement learning adaptive control for upper limb rehabilitation robot based on fuzzy neural network. In: Control Conference (CCC), pp. 5157–5161 (2012)

    Google Scholar 

  6. Ormoneit, D., Sen, Ś.: Kernel-based reinforcement learning. Mach. Learn. 49(2–3), 161–178 (2002)

    Article  MATH  Google Scholar 

  7. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (2014)

    MATH  Google Scholar 

  8. van Seijen, H., Sutton, R.S.: True online TD(\(\lambda \)). In: Proceedings of the 31st International Conference on Machine Learning (ICML), pp. 692–700 (2014)

    Google Scholar 

  9. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Driessche, G.V.D., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  10. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  11. Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  12. Xiaoyang, T., Songcan, C., Zhi-Hua, Z., Fuyan, Z.: Recognizing partially occluded, expression variant faces from single training image per person with som and soft \(k\)-NN ensemble. IEEE Trans. Neural Netw. 16(4), 875–886 (2005)

    Article  Google Scholar 

  13. Xu, X., Huang, Z., Graves, D., Pedrycz, W.: A clustering-based graph Laplacian framework for value function approximation in reinforcement learning. IEEE Trans. Cybern. 44(12), 2613–2625 (2014)

    Article  Google Scholar 

  14. Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Netw. 18(4), 973–992 (2007)

    Article  Google Scholar 

Download references

Acknowledgement

This work was funded by National Science Foundation of China (61303108, 61373094, 61472262), Natural Science Foundation of Jiangsu (BK2012616), High School Natural Foundation of Jiangsu (13KJB520020), Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172014K04), Suzhou Industrial application of basic research program part (SYG201422), Provincial Key Laboratory for Computer Information Processing Technology, Soochow University (KJS1524).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuchen Fu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Zhu, H. et al. (2016). A Kernel-Based Sarsa(\(\lambda \)) Algorithm with Clustering-Based Sample Sparsification. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9949. Springer, Cham. https://doi.org/10.1007/978-3-319-46675-0_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46675-0_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46674-3

  • Online ISBN: 978-3-319-46675-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics