A Kernel-Based Sarsa( $$\lambda $$ ) Algorithm with Clustering-Based Sample Sparsification

Zhu, Haijun; Zhu, Fei; Fu, Yuchen; Liu, Quan; Zhai, Jianwei; Sun, Cijia; Zhang, Peng

doi:10.1007/978-3-319-46675-0_24

Haijun Zhu¹⁹,
Fei Zhu^19,20,
Yuchen Fu^19,21,
Quan Liu¹⁹,
Jianwei Zhai¹⁹,
Cijia Sun¹⁹ &
…
Peng Zhang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9949))

Included in the following conference series:

International Conference on Neural Information Processing

3105 Accesses
1 Citations

Abstract

In the past several decades, as a significant class of solutions to the large scale or continuous space control problems, kernel-based reinforcement learning (KBRL) methods have been a research hotspot. While the existing sample sparsification methods of KBRL exist the problems of low time efficiency and poor effect. For this problem, we propose a new sample sparsification method, clustering-based novelty criterion (CNC), which combines a clustering algorithm with a distance-based novelty criterion. Besides, we propose a clustering-based selective kernel Sarsa($\lambda $) (CSKS($\lambda $)) on the basis of CNC, which applies Sarsa($\lambda $) to learning parameters of the selective kernel-based value function based on local validity. Finally, we illustrate that our CSKS($\lambda $) surpasses other state-of-the-art algorithms by Acrobot experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The matrix $N\ge M$ and $\mathscr {K}=[\varvec{k}(s_1),\varvec{k}(s_2),...,\varvec{k}(s_N)]$ has full rank, where N is the size of the state-action space.

References

Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Chen, X., Gao, Y., Wang, R.: Online selective kernel-based temporal difference learning. IEEE Trans. Neural Netw. Learn. Syst. 24(12), 1944–1956 (2013)
Article Google Scholar
Engel, Y., Mannor, S., Meir, R.: The kernel recursive least-squares algorithm. IEEE Trans. Signal Process. 52(8), 2275–2285 (2004)
Article MathSciNet Google Scholar
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. J. Mach. Learn. Res. 6(2), 503–556 (2005)
MathSciNet MATH Google Scholar
Fan-Cheng, M., Ya-Ping, D.: Reinforcement learning adaptive control for upper limb rehabilitation robot based on fuzzy neural network. In: Control Conference (CCC), pp. 5157–5161 (2012)
Google Scholar
Ormoneit, D., Sen, Ś.: Kernel-based reinforcement learning. Mach. Learn. 49(2–3), 161–178 (2002)
Article MATH Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (2014)
MATH Google Scholar
van Seijen, H., Sutton, R.S.: True online TD($\lambda $). In: Proceedings of the 31st International Conference on Machine Learning (ICML), pp. 692–700 (2014)
Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Driessche, G.V.D., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1996)
Article MathSciNet MATH Google Scholar
Xiaoyang, T., Songcan, C., Zhi-Hua, Z., Fuyan, Z.: Recognizing partially occluded, expression variant faces from single training image per person with som and soft $k$-NN ensemble. IEEE Trans. Neural Netw. 16(4), 875–886 (2005)
Article Google Scholar
Xu, X., Huang, Z., Graves, D., Pedrycz, W.: A clustering-based graph Laplacian framework for value function approximation in reinforcement learning. IEEE Trans. Cybern. 44(12), 2613–2625 (2014)
Article Google Scholar
Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Netw. 18(4), 973–992 (2007)
Article Google Scholar

Download references

Acknowledgement

This work was funded by National Science Foundation of China (61303108, 61373094, 61472262), Natural Science Foundation of Jiangsu (BK2012616), High School Natural Foundation of Jiangsu (13KJB520020), Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172014K04), Suzhou Industrial application of basic research program part (SYG201422), Provincial Key Laboratory for Computer Information Processing Technology, Soochow University (KJS1524).

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215000, China
Haijun Zhu, Fei Zhu, Yuchen Fu, Quan Liu, Jianwei Zhai, Cijia Sun & Peng Zhang
Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, China
Fei Zhu
School of Computer Science and Engineering, Changshu Institute of Technology, Changshu, 215500, China
Yuchen Fu

Authors

Haijun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Fei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yuchen Fu
View author publications
You can also search for this author in PubMed Google Scholar
Quan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianwei Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Cijia Sun
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuchen Fu .

Editor information

Editors and Affiliations

The University of Tokyo , Tokyo, Japan
Akira Hirose
Kobe University , Kobe, Japan
Seiichi Ozawa
Okinawa Institute of Science and Technology Graduate University, Onna, Japan
Kenji Doya
Nara Institute of Science and Technology , Ikoma, Japan
Kazushi Ikeda
Kyungpook National University , Daegu, Korea (Republic of)
Minho Lee
Chinese Academy of Sciences , Beijing, China
Derong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, H. et al. (2016). A Kernel-Based Sarsa($\lambda $) Algorithm with Clustering-Based Sample Sparsification. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9949. Springer, Cham. https://doi.org/10.1007/978-3-319-46675-0_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-46675-0_24
Published: 29 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46674-3
Online ISBN: 978-3-319-46675-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Kernel-Based Sarsa(\(\lambda \)) Algorithm with Clustering-Based Sample Sparsification

Abstract

Access this chapter

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Kernel-Based Sarsa(\(\lambda \)) Algorithm with Clustering-Based Sample Sparsification

Abstract

Access this chapter

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation